CS231n: How to calculate gradient for Softmax loss function?

Not sure if this helps, but:

y_i is really the indicator function y_i, as described here. This forms the expression (j == y[i]) in the code.

Also, the gradient of the loss with respect to the weights is:

y_i

where

y_i

which is the origin of the X[:,i] in the code.

Leave a Comment