Backward function in PyTorch

Please read carefully the documentation on backward() to better understand it. By default, pytorch expects backward() to be called for the last output of the network – the loss function. The loss function always outputs a scalar and therefore, the gradients of the scalar loss w.r.t all other variables/parameters is well defined (using the chain rule). Thus, by default, backward() is called on a scalar … Read more