Data Science Ecosystem

Welcome to the second chapter in this course! Assuming you have finished the first chapter, you should have a fairly solid understanding of many of the key concepts of machine learning. So far, you have used some libraries (such as Scikit Learn to load the MNIST dataset), but most of these have been used without explaining what they actually do. The previous chapter was about understanding the key concept of AI and ML; this is about data and Python data science ecosystem.

What you will learn, among other things, are:

  • How to import and use certain libraries such as Pandas.
  • How to load various file formats such as CSV files including thousands of rows or columns.
  • How to filter, group and merge DataFrames
  • How to visualize the data by creating graphs
  • How to perform exploratory data analysis and feature engineering.

Datasets used

You will work with two datasets:

  • BackBlaze's hard drive test data
  • Titanic passenger data

BackBlaze's dataset will not be used for machine learning, since the task is way too advanced for an introductory course, but you will be guided to tutorials where you can learn how you would approach this kind of timeseries based prediction. Titanic, on the other hand, will be used for training a classifier. This will be done on a very last lesson of this chapter, and that lesson will skills from all others lessons into practice.