CS231n: How to calculate gradient for Softmax loss function?
Not sure if this helps, but: is really the indicator function , as described here. This forms the expression (j == y[i]) in the code. Also, the gradient of the loss with respect to the weights is: where which is the origin of the X[:,i] in the code.