numpy : calculate the derivative of the softmax function

I am assuming you have a 3-layer NN with W1, b1 for is associated with the linear transformation from input layer to hidden layer and W2, b2 is associated with linear transformation from hidden layer to output layer. Z1 and Z2 are the input vector to the hidden layer and output layer. a1 and a2 represents the output of the hidden layer and output layer. a2 is your predicted output. delta3 and delta2 are the … Read more