My Understandings of Neural Networks: a Mathematical Perspective

2 min readApr 12, 2020

The first time I learned about neural networks was November 2019 when taking the famous machine learning course by Andrew Ng. I had a hard time understanding the algorithm (not gonna lie).

Early this year I spent some time studying neural networks again by referring many other resources. This time I asked myself, mathematically, what is going on?

Well, mathematically it’s pretty simple! Just calculating forward and then calculating backward. What does it mean? Let me explain.

What is calculating forward?

What we have in hand are: X (features) and y (the real y), and initial weights that could be randomly generated.

We then plug in X and initial weights into a function (f function above) to calculate y. Notice y here is a predicted y.

Since we have the real y in hand, we can calculate the data loss between the predicted y and the real y. Let’s say we use softmax loss function. We pack the result of the f function as z, plug it into softmax loss function, then we’ll get a probability result by which we can tell if the prediction is correct.

Now once we know if the prediction is correct, we get a rough idea about how well the initial weights worked. But our goal is to optimize the weights so to make data loss as small as possible.

How do we do that? Well we need to know how much the initial weights contributed to the data loss. How can we know that? We need to find it out by calculating the partial derivative of data loss with respect to w!

But wait, we cannot calculate that directly! But we can calculate it by going backward!

If we first calculate the partial derivative of data loss with respect to z, then calculate partial derivative of f with respect to w, we can know the partial derivative of data loss with respect to w by multiplying those two. That is,

dL/ dw =(dL/dz)*(df/dw)

In this way we can find out how much the weights contribute to the data loss. And we can iterate weights, do the forward and backward calculation again. Then we iterate weights again, do the forward and backward calculation again…keep repeating this process until the data loss is minimized!

That’s it. I hope my mathematical perspective is helpful to anyone out there confused by neural networks!

My Understandings of Neural Networks: a Mathematical Perspective

Written by Jodie Heqi Qiu