TransWikia.com

Why do we use stochastic gradient descent in neural networks and what are the main ideas behind this optimization technique?

Data Science Asked on March 18, 2021

I am a student and I am studying machine learning. I am focusing on neural network, and I have seen that for a neural network, to define the optimal weights, we don’t use gradient descent, but we use stochastic gradient descent.

As far, I have unserstood that the main difference is that for gradient descent we perform the iterative step of computing the gradient and adding it to the current weight for the whole dataset, while for stochastic gradient descent we compute the derivative and update the weights for only one datapoint at a time (or for batches of datapoints), so we have something that looks like this:

enter image description here

teying to go deeper in the topic, I have found that we use stochastic gradient descent because the loss function is not convex, but I don’t understand why it should not be convex, and also I have found another explaination which says that we use sthocastic gradient descent because it reduces the complexity in computations.

All these explainations gave me a lot of doubts, and I am a little bit lost.

So, Why do we use stochastic gradient descent in neural networks and what are the main ideas behind this optimization technique?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP