## ValueError: x and y must be the same size

Print X_train shape. What do you see? I’d bet X_train is 2d (matrix with a single column), while y_train 1d (vector). In turn you get different sizes. I think using X_train[:,0] for plotting (which is from where the error originates) should solve the problem

## How to improve paralelized computing in AWS EC2?

I am using mpi4py and MPICH (installed with conda) to parallelize the training of a reinforcement learning system across several CPUs ( using an AWS EC2 instance, namely, a c5.x12) with Ubuntu. I have benchmarked the performance and the amount of training per unit of time increases 30% (when 5 processes are used) with respect …

## What is the meaning of ‘for _ in range()

When you are not interested in some values returned by a function we use underscore in place of variable name . Basically it means you are not interested in how many times the loop is run till now just that it should run some specific number of times overall.

## Intuition for perceptron weight update rule

The perceptron’s output is the hard limit of the dot product between the instance and the weight. Let’s see how this changes after the update. Since w(t + 1) = w(t) + y(t)x(t), then x(t) ⋅ w(t + 1) = x(t) ⋅ w(t) + x(t) ⋅ (y(t) x(t)) = x(t) ⋅ w(t) + y(t) [x(t) …

## What is the difference between np.mean and tf.reduce_mean?

The functionality of numpy.mean and tensorflow.reduce_mean are the same. They do the same thing. From the documentation, for numpy and tensorflow, you can see that. Lets look at an example, Output Here you can see that when axis(numpy) or reduction_indices(tensorflow) is 1, it computes mean across (3,4) and (5,6) and (6,7), so 1 defines across which axis the mean is computed. When it is 0, the …

## Backward function in PyTorch

Please read carefully the documentation on backward() to better understand it. By default, pytorch expects backward() to be called for the last output of the network – the loss function. The loss function always outputs a scalar and therefore, the gradients of the scalar loss w.r.t all other variables/parameters is well defined (using the chain rule). Thus, by default, backward() is called on a scalar …

## How to implement the Softmax function in Python

They’re both correct, but yours is preferred from the point of view of numerical stability. You start with By using the fact that a^(b – c) = (a^b)/(a^c) we have Which is what the other answer says. You could replace max(x) with any variable and it would cancel out.

## What is the difference between sparse_categorical_crossentropy and categorical_crossentropy?

Simply: categorical_crossentropy (cce) produces a one-hot array containing the probable match for each category, sparse_categorical_crossentropy (scce) produces a category index of the most likely matching category. Consider a classification problem with 5 categories (or classes). In the case of cce, the one-hot target may be [0, 1, 0, 0, 0] and the model may predict [.2, .5, .1, .1, .1] (probably right) In the …

## word2vec: negative sampling (in layman term)?

The idea of word2vec is to maximise the similarity (dot product) between the vectors for words which appear close together (in the context of each other) in text, and minimise the similarity of words that do not. In equation (3) of the paper you link to, ignore the exponentiation for a moment. You have The numerator is …

## What is cross-entropy?

Cross-entropy is commonly used to quantify the difference between two probability distributions. In the context of machine learning, it is a measure of error for categorical multi-class classification problems. Usually the “true” distribution (the one that your machine learning algorithm is trying to match) is expressed in terms of a one-hot distribution. For example, suppose …