How to improve paralelized computing in AWS EC2?

I am using mpi4py and MPICH (installed with conda) to parallelize the training of a reinforcement learning system across several CPUs ( using an AWS EC2 instance, namely, a c5.x12) with Ubuntu. I have benchmarked the performance and the amount of training per unit of time increases 30% (when 5 processes are used) with respect to the training with one single process. However, when I use 5 processes in my local computer I get an increase in the amount of training per unit of time of 300% with respect to the training with one single process.

In my computer I use Windows and Microsoft MPI, which I think is based on MPICH, so what can cause this performance difference? How can I get the best from AWS?

Leave a Comment