How are Neural Networks trained?

Nur Younis
4 min readApr 9, 2021
The BlackBox of a Neural network.

*All the information provided below is retrieved from the link added at the end*

In this part of the course, Jabril Ashe provides us with an introduction to training Neural Networks.

Neural Networks use an algorithm — Backpropagation — to ensure all the neurons that contribute to an error adjust their math.

Neural Networks’ (NN) architecture include: neurons and their connections. If the NN makes a mistake it means the weight is unadjusted so need to be fixed for next time through optimization.

Optimization:

Imagine I manage a swimming pool and I want to predict the number of visitors. I get some data points: temperature and number of swimmers in the past. Through linear regression, we can try to find patterns. By calculating the distance between the line and each data point and adding them all, we would find the error. The goal of linear regression is to adjust the line to reduce the error as much as possible. This is called the line of best fit, where we will predict the number of swimmers.

Cold days will have a negative numbers, adding humidity would turn this into a 3D graph, but if we add another feature, we cannot visualize it anymore. The more features, the trickier the graph. If we connect neurons with weights, the machine learns how to solve complex problems.

If the input layer takes humidity, temperature, or rain, the output layer predicts the number of swimmers. However, in a graph, this would be difficult to interpret. We start by 3 features and their values, neurons will multiply the features by their weight and pass it to other layers until the output layer has an answer.

We need to keep in mind that there’s a difference on the neural network’s ouput layer and the actual attendance, which is called the error. The prediction between the correct and predicted answer is more than one number. This error is represented by the loss function.

Loss Function.

The weights, in this case, need to be adjusted to make it more accurate. Remember supervised learning where we pressed buttons to help machines learn? Here we use backpropagation of the error.

The information, however, is fed backwards. The error would go back starting from the output layer and would adjust its weight until it reaches the features again. We need to find the best combination of weights to get the lowest error.

Backpropagation.

Another example provided is about latitute and altitude. The error of the NN is the altitude. In this problem, the lower the better. We need to find the lowest point in the deepest value. The altitude and latitude of the lowest points where the error is the smallest, are the weights of the global optimal solution.

However, the machine doesn’t know where the value is. All it knows is the current latitude, longitude, and altitude. In this case, the machine will explore and take a guess, after which it will update its weights. With every step, it will update its coordinates, once it finds that it cannot go deeper, it makes a guess that that is the solution.

But what if it is not the lowest point? AKA what we are looking for?

Then this is called Local Optimal Solution. The error is relatively small, but not the lowest it could be — AKA Global Optimal Solution.

Gradient Descent. Example on Local and Global Minima.

However, there are some tricks! Different random starting points can be tried to make sure the NN does not get stuck at the LOS or have different simultaneous starting points, which works if you have a computer with plenty of processors.

We cana djust the step sizes, which are called Learning Rate, which is how much the NN weights get adjusted everytime propagation happens.

Creative solutions are always used, but we are only half way done. The other half of the training, is a check-in where the machine can answer any questions. The more data, the better it is to find patterns.

Over time, backpropagation will adjust the neurons’ weights so that the NN output matches the training data, which is called fitting to the training data. We might be looking ot multidimensional functions and backpropagation is good to find fits in big data sets.

Overfitting needs to be prevented by keeping the NN simple, otherwise it will find patterns that don’t make sense. This will increase our accuracy.

8 Simple Techniques to Prevent Overfitting | LaptrinhX

Consider how to best represent our problems as features and think which errors the programs might make.

--

--

Nur Younis

Where curious minds interested in the intersection of finance, technology, and sustainability meet.