Parametric and Non-Parametric Learning

The last topic in this chapter of this course is parameterization. You have actually seen a case of parametric and non-parametric learning during this course. Stop reading and think: can you figure out what has been which? What are parameters in models?

The non-parametric algorithm we used was k-NN. It doesn't define any parameters that would indicate whether an entity belongs to a class A, B or C. Instead, it stores the n-dimensional coordinate of each entity in the training set to the model. The more you have samples, the larger the model will be. We have also used a parametric algorithm: the simple linear regression. When we are training the model, the only information is stores are the weights (or a coefficient). By convention, the weights are usually called as a vector or matrix W or in math notation the θ (theta). Regardless of the naming, the model parameters are an n-dimensional set of numbers. If you take a dot product of W and X (features), you will get predictions of y. Even if the training set contains 10 million samples (feature vectors), the dimensionality of the parameters will remain the same.

Test your understanding: If there are 3 features (e.g. hours studied, hours slept, grade of the previous course), how many parameters will the weight vector hold? Think. You know this! I'm not giving the correct answer, but a similar question might appear in the exam/test.

Parametric

Parametric models are easy to understand. After the algorithm have been trained, the model is really fast. Data has to have linear relationships with the predicted output. Any outliers, missing data or noise in the data requires careful data preparation before the training.

Most common algorithms in this category are linear and logistic regression. No matter which parametric model you will use, you will end up with some coefficients that are easy to interpret. In the case of linear regression, these weights are simply multipliers for the features. Investigating these coefficients can provide insightful information about, for example, what factors have the highest impact on customers decisions on buying or skipping your product.

Non-Parametric

Non-parametric models may be easy to understand as models, but analysing what a trained model actually does isn't often very intuitive. For example, k-Nearest Neighbour classifier is easy to understand as a model, but given 100 features, the prediction is based on the majority vote of closest k entities in 100 dimensions. Which of the features have the most impact on the prediction? There are no simple parameters to give a straight answer to this.

Other non-parametric algorithms include but are not limited to various decision trees and support vector machines. They can be very powerful at finding underlying relations between different features.

Task: Find out whether deep learning models (deep neural networks) are parametric on non-parametric. Use your favorite search engine and spend some time reading. This will solidify your understanding of parameters.

Reading task: Open up the downloaded PDF book (A Brief Introduction to Machine Learning for Engineers, available here.) Read pages 15-17, up to the heading 2.2 Inference.

Conclusion of the Chapter 1

Since this is the last lesson in this chapter, spend some time making sure that you understand everything so far. By minimum, you should be able to define the terms below and describe how they relate to machine learning and AI:

Supervised learning
Classification
- k-NN classifier
Regression
- Linear Regression
Non-supervised learning
Reinforcement learning
Parametric learning
Weights (coefficients, often marked with theta symbol)
Labels (covariates, y, ground-truth. Only in supervised learning)
Bias
Feature matrix X
Feature vector x
Fixed amount of parameters. No amount of data will change this.
Non-parametric learning
Non-fixed parameters. The increasing amount of data will affect the model.