TransWikia.com

Pytorch - Gradient distribution between functions

Data Science Asked by Wickkiey on March 27, 2021

https://colab.research.google.com/github/pytorch/tutorials/blob/gh-pages/_downloads/neural_networks_tutorial.ipynb

Hi I am trying to understand the NN with pytorch.
I have doubts in gradient calculations..

import torch.optim as optim

create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

From the about code, I understood loss.backward() calculates the gradients.
I am not sure, how these info shared with optimizer to update the gradient.

Can anyone explain this..

Thanks in advance !

One Answer

Recall that you passed net.parameters() to the optimizer, so it has access to the "Tensor" objects, as well as their associated data. One of the associated data fields associated to each learnable tensor parameter is a gradient buffer. Hence, backward() not only computes the gradients, but stores them in each parameter tensor, so that the gradient vector per parameter is stored along with that parameter. In other words, for some parameter $theta_i$, backward() stores $ partial mathcal{L}(Theta)/partial theta_i$ along with that parameter. The optimizer.step() call then simply updates each parameter via the gradient stored along with it.

Answered by user3658307 on March 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP