Why do we use stochastic gradient descent in neural networks and what are the main ideas behind this optimization technique?

Question

I am a student and I am studying machine learning. I am focusing on neural network, and I have seen that for a neural network, to define the optimal weights, we don't use gradient descent, but we use stochastic gradient descent.
As far, I have unserstood that the main difference is that for gradient descent we perform the iterative step of computing the gradient and adding it to the current weight for the whole dataset, while for stochastic gradient descent we compute the derivative and update the weights for only one datapoint at a time (or for batches of datapoints), so we have something that looks like this:

teying to go deeper in the topic, I have found that we use stochastic gradient descent because the loss function is not convex, but I don't understand why it should not be convex, and also I have found another explaination which says that we use sthocastic gradient descent because it reduces the complexity in computations.
All these explainations gave me a lot of doubts, and I am a little bit lost.
So, Why do we use stochastic gradient descent in neural networks and what are the main ideas behind this optimization technique?

Why do we use stochastic gradient descent in neural networks and what are the main ideas behind this optimization technique?

Add your own answers!

Ask a Question