What does “unsqueeze” do in Pytorch?

If you look at the shape of the array before and after, you see that before it was (4,) and after it is (1, 4) (when second parameter is 0) and (4, 1) (when second parameter is 1). So a 1 was inserted in the shape of the array at axis 0 or 1, depending on the value of the second parameter. That is opposite of np.squeeze() (nomenclature borrowed from … Read more

Embedding in pytorch

nn.Embedding holds a Tensor of dimension (vocab_size, vector_size), i.e. of the size of the vocabulary x the dimension of each vector embedding, and a method that does the lookup. When you create an embedding layer, the Tensor is initialised randomly. It is only when you train it when this similarity between similar words should appear. Unless you … Read more

Cross Entropy in PyTorch

In your example you are treating output [0, 0, 0, 1] as probabilities as required by the mathematical definition of cross entropy. But PyTorch treats them as outputs, that don’t need to sum to 1, and need to be first converted into probabilities for which it uses the softmax function. So H(p, q) becomes: Translating the output [0, 0, 0, 1] into … Read more