What is a Neural Network?
The concept of Neural Networks has undeniably been one of the most historic breakthroughs in Machine Learning and AI. In more general terms, the study of Neural Networks is known as Deep Learning. Deep Learning can be thought of as a subset of Machine Learning which when further extended to include different application programs is the much-loved Artificial Intelligence.
The thought behind the development of Neural Networks was pretty simple:
The goal here is to make the Machine learn without human intervention, or simply, give the computer a brain of its own. And the most obvious way to do this was to try and replicate the working of the human brain to as much extent as possible.
Now, when we think of the human brain, the first “technical” or you may say, “biological” term that comes to your mind would be a brain cell, popularly known as a Neuron.
Now, as far as scientists could learn from the structure and working of the human brain - they did, and they came up with the concept of Neural Networks - A network of Neurons.
Let’s begin by looking at a neuron more closely.
This is how a neuron looks like:
Let’s break down all the components of this image:
X₁, X₂, X₃ are the inputs of this neuron. The connecting lines or bridges, technically called Synapses are assigned some weights, say w₁, w₂, w₃. Don’t worry, these variables will just make it easier. Now when this information enters the neuron, it’s called a weighted input as the weights are multiplied with their associated inputs (pretty obvious) and then summed together.
The expression looks like this
A function is then applied to this expression for manipulation known as the Activation Function. The expression thus obtained gives us the output of this single neuron.
Lets try and see what exactly is the significance of equation (1):
It is able to give a quantitative importance to each of the weights relative to each other, or “extract features based on their importance”. The application of the activation function simply manipulates the expression so as to be able to give an output that is a function of the three inputs. Now, our output will be a function of these variables in our example and can be plotted on the graph. Sometimes, our output function may not be passing through the origin and a constant term has to be added to it for an accurate outcome. This constant term is known as a bias and it makes the equation look like this:
Bias is like the intercept added in a linear equation. It is an additional parameter in the Neural Network which is used to adjust the output along with the weighted sum of the inputs to the neuron. Therefore Bias is a constant which helps the model in a way that it can fit best for the given data.
output = sum (weights * inputs) + bias
This is how one single neuron works.
If the output from this neuron is predicting a probability between 0 and 1, or is working as a “binary classifier”, it is known as a perceptron(fancy word for a simple thing). Perceptrons were discovered in the 1950s and are basically a single-layer binary classifier neural network.
Now, imagine a number of these neurons at one level or “layer” and a number of such layers all being connected to each other by synapses. There, you have a neural network. Just imagine the computational power this Network will have using simple equations as (1) and different activation functions. The output from one neuron in one layer, serves as one of the inputs for neurons in the next layer. A neural network looks something like this:
This is a fully connected Neural Network with 4 inputs, 1 hidden layer and 1 output layer. Sounds pretty simple right?
We will go into the depths of each layer in the upcoming blogs.
Types of a Neural Network
Let’s go into further depths and explore the three basic types of Neural Networks.
1. Artificial Neural Networks (ANN)
The simple Neural Network as discussed above is basically an ANN. It is known as a feed-forward Neural Network as it processes inputs only in the forward direction. It consists of an input layer, one or more hidden layers and an output layer.
The input layer provides the input, the hidden layers process the input and the output layer gives the outcome. All the hidden layers basically try to “learn” certain features and as the depth of the layers increase, the complexity of the features they are able to learn increases. In ANN, the human extracts features that are to be fed as input. For an input that consists of rows of data, each row can be fed individually with each cell in the row serving as a different input node. For an image dataset, images are nothing but an array of pixels, a 3D array to be precise. To be fed into the ANN, human extraction of important features is needed that will give a 1D array to be fed as the input to the
Applications of ANN: Handwriting Recognition, Text Classification and Categorization, Text Summarization etc.
2. Recurrent Neural Networks (RNN)
A looping constraint at every hidden layer node in the ANN makes it an RNN. The purpose of this loop is to ensure that sequential data is captured in the input. The output from the current step is fed as input to the same step! The problem of having to learn various parameters (weights and biases) for each neuron and for each layer is reduced by the RNN as the network has a “memory” unit. It memorizes the output from the current neuron, and feeds it back to itself.
Let’s see how the looping constraint works like if unrolled.
RNNs share the parameters across different time steps and this is known as Parameter Sharing. This results in fewer parameters to train and decreases the computational costs.
Network types: Boltzmann machine networks, Hopfield networks, Long-Short Term Memory (LSTM) Neural Network.
More on these in the upcoming blogs!
Applications of RNN: Conversational Interfaces and Chatbots, Speech Recognition Time Series Prediction, etc.
3. Convolutional Neural Networks (CNN)
CNN is basically some pre processing of the input data before it is fed into the ANN. This preprocessing is the extraction of features which is done by the machine and not by humans. Extraction of features is done using powerful operations such as convolution, pooling, ReLU and flattening. This results in a 1-D vector of features that can be fed into an ANN for “learning”. This is the primary difference between CNN and ANN and also the central reason that CNN is the preferred choice for object classification and image processing operations.
Applications of CNN: Face Recognition, Image Classification, etc.
Hit us up for any doubt and all kinds of feedback is welcome! We’re students trying to learn new things and making it easier for beginners to understand supposedly complicated things! Feel free to hit us up on our mailbox and if you liked this blog, please subscribe to our page so we can update you everytime we write something new!