Unsupervised Learning

This module is about unsupervised learning, reinforcement learning and search algorithms. None of these will be discussed in detail. The goal is to give you an overview of what these algorithms do.

The first topic in this module is unsupervised learning. The topic was briefly discussed on our "Most Common Types of Machine Learning" lesson. Recall that unsupervised learning has no labels associated with the data. We don't have a class to predict (e.g. survived/not-survived) or a numerical value to predict (e.g. housing price). Instead, our model is doing its best to find structures in the data.

Having that said, we might end up using supervised learning on the same dataset later on. Unsupervised learning might reveal structures in our dataset that could be used as labels. Also, some unsupervised learning techniques, such as PCA, can be used as a part of data to fight the curse of dimensionality.

Common algorithms available in Scikit Learn include:

  • k-Means for clustering.
  • Principal Component Analysis for dimensionality reduction.
  • Local Outlier Factor for anomaly detection.

Neural Networks are also used for various unsupervised learning tasks, such as creating a low-noise encoding of an input image (using Autoencoders). This topic is beyond the scope of this course, but worth mentioning. Use your favorite search engine to find out information with a search query like: "autoencoder remove noise".

Within this course, we will perform k-Means clustering on color images in order to find a color map, and PCA (Principal Component Analysis) for reducing dimensionality of a given dataset.

Conclusion

Remember that the supervised learning was finding patterns in the dataset using labels. Depending on the problem, we would use a classifier or a regressor as an estimator.

Unsupervised learning, on the other hand, is finding structure in the dataset without any labelled data. Typical tasks are clustering, dimensionality reduction and anomaly detection.