autograd - Read For Learn

Backward function in PyTorch

Please read carefully the documentation on backward() to better understand it. By default, pytorch expects backward() to be called for the last output of the network – the loss function. The loss function always outputs a scalar and therefore, the gradients of the scalar loss w.r.t all other variables/parameters is well defined (using the chain rule). Thus, by default, backward() is called on a scalar … Read more