Chapter 16: File Handling, CSV & JSON
Every piece of data a pipeline processes started somewhere. A CSV dropped into a folder by an accounting system. A JSON response from an API call. A log file written by a server at midnight. Before any transformation happens, your code has to open that file, read what is inside, and do something with it.
Chapter 17: Error Handling
A pipeline that crashes with a clear error message is better than one that runs silently and produces wrong answers. The crash tells you exactly what went wrong and where. Silent bad data ships to a dashboard, and someone discovers the problem three weeks later in a board meeting.
Chapter 18: NumPy
Python lists are flexible — they can hold mixed types, grow dynamically, and do most things you need. But flexible has a cost. When you need to multiply a million numbers by two, Python loops through each one, checks its type, does the operation, moves to the next. On large datasets that is slow.
Chapter 19: Pandas
NumPy is fast at operating on numbers. But real-world data is not just numbers — it is tables with column names, mixed types, missing values, and rows that need to be filtered, grouped, and joined. Pandas was built to handle exactly that.