Types of Artificial Neural Networks

Artificial neural networks (short: NN, neural network) are a group of predictive machine learning algorithms that are been inspired by how human brain works. It consist of neurons that are connected to another neurons. Neurons are sometimes called as cells, or, in they belong to hidden layers, they might be called as nodes.

Do notice that there are multiple ways to connect these cells. Take a look at Asimov Institute's neural network zoo. It should be clear that we won't discuss most of these NN's during this course. We will focus on the very basics: Perceptron and (Deep) Feed Forwards networks.

Each neuron has two or more inputs. It outputs a weighted sum of input values. Notice the word "weighted". We have used weights and sum before in regression and classification: x[0] * w[0] + x[1] * w[1] ... x[n] * w[n]. In the next lessons, you will see that feed forward NN's are not magical or super complex.

Terminology and History

Various forms of neural networks have existed since 1940s. Perceptron, the algorithm we will write from scratch in the next lesson, was created in 1950s. These early neural networks included only one layer (single layer perceptron), making them binary classifiers only capable of learning linearly separable patterns. Recall our Iris dataset: the class 0 (setosa) is linearly separable from classes 1 and 2 (versicolor & virginica).

Scikit PCA Iris 1D

Starting from 1960s and 70s, a method called backpropagation, which we will learn more about later on, allowed the use of multi-layer feed forward NN's. This technique was independently rediscovered multiple times by different researchers. With the use of nonlinear activation functions (recall sigmoid), the neural networks were capable of predicting nonlinear functions with multivariate classification problems - at least on paper. In practice, the lack on computation power in 1970s and 1980s didn't allow using more than 2 hidden layers. The current trend is to call neural networks as deep learning and various architectures can hold tens or hundreds of layers (e.g. ResNet101 is 101 layers deep.) Instead of using the CPU (Central Processing Unit), these deep models are usually trained with GPU (Graphics Processing Unit). Thus, neural networks or deep learning are not a new invention, but increased computing performance allows us to train deeper (and wider) neural networks than before.

Task: Read online articles and try to find out if there is consensus of how many layers there needs to be for a network to be deep instead of shallow. 3+? No consensus? Some authors demand hierarchical feature learning?

The key terms are:

Neuron. Also known as unit or node. A neuron takes multiple input values and computes an output value.
Node. Another name for neuron, usually used only if the neuron is in a hidden layer.
Layer. A set of neurons.
Input Layer. 64x64 image requires an input layer with 4096 neurons.
Hidden Layer(s). The more there are of these, the deeper the network is. The number of hidden layers correlates with the model complexity. Usually, each hidden layer is followed by an activation layer.
Activation Layer. Non-linear activation function such as ReLu or Sigmoid.
Output layers. On a NN-classifier, the output layer is of width n_classes.

Traditional vs Deep Learning

If we divide machine learning algorithms into two groups: deep learning algorithms and the rest ("traditional machine learning"), you might wonder why we would use one over other.

| | Traditional | Deep Learning | | ------------------------------ | ---------------------------------------------------------- | ------------------------------------------ | | # of samples | Can perform well with small amount of data | Requires huge volumes of data | | Computational requirements | Low | High | | Features | Performs poorly on raw data. Requires feature descriptors. | Learn using raw data (e.g. RGB pixel data) | | Output | Usually a number of a label | Any format | | Interpretability | Easy-to-understand coefficients for features | Black box | | Training time | Minutes to hours | Hours to weeks |

Deep learning justifies itself when you have high volumes of data that is complex (non-structured) of nature. Note that most neural networks can be seen as "black boxes". After training the model, it may give you the seemingly correct output, but explaining how this happened can be challenging.Especially when used in decision-making (for example in judicial system as a tool for judges), the interpretability problem needs to be taken seriously.

One major benefit of using deep learning is that you can (at least partially) skip the feature engineering process. For example, if you want to detect objects (["a cat", "a book", "a house", "a sofa", ...]) in images using traditional ML, you need to form some features.

Traditional approach would be using a keypoint detector to find meaningful xy-locations in the image. These are usually corners and edges of objects. The surrounding pixels of these keypoints would be quantified using a descriptor. Using these n_keypoint feature vectors, you would form a bag-of-words model, which is a vector containing the occurrence counts of these features. The process might sound complex, but each individual step in the pipeline is fairly simple. You can examine and visualize each step, helping you observe why is some input image labeled as "a cat" instead of "a book".
Deep learning approach is very different. You would use the RGB pixel values are input data. The only required preprocessing is resizing the images. The features are then automatically learning during the training. Since the output of n-th hidden layer is the input of n+1-th layer, these features form a hierarchy. The first few layers will describe geometric shapes such as edges and circles. The further you dive into the model, the more abstract these concepts will be. The 12th layer might as well specialize in cat faces!

TASK: Read this Forbes article about Explainable Artificial Intelligence (XAI). Feel free to search and read other articles about the topic.

Conclusions

Neural networks are algorithms inspired by the design of human brain. They allow us to make prediction on data that comes in high volumes and is highly complex in nature. Usual use cases include machine translation, recommender systems, voice recognition, face recognition, reinforcement learning (e.g. AlphaGo), bioinformatics (protein folding), traffic analytics and many others. During the current era, the neural networks are usually referred with a name deep learning.

TASK: The hidden layers in a neural network are said to be a black box, but one way to visualize the individual layers: run the trained network in reverse. Instead of changing the weights (like in training), change the input image. The goal is to gradually change the images pixel values to minimize the cost function. If you lock all other weights than the weights of a layer n, you will create a a trippy version of the input image where the layer_n_resemblance has been amplified. Your task is to explore this topic called Deep Dream. A good place to start is Deep Dream video by Computerphile. If you get interested, you might want to research a similar topic called style transfer.

HOMEWORK: Read the Introduction of the Deep Learning textbook (2016 by Goodfellow et al.) These 26 pages will get you started in understanding the field of deep learning.