The problem was caused by not setting the CUDA_VISIBLE_DEVICES
variable within the shell correctly.
To specify CUDA device 1
for example, you would set the CUDA_VISIBLE_DEVICES
using
export CUDA_VISIBLE_DEVICES=1
or
CUDA_VISIBLE_DEVICES=1 ./cuda_executable
The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation.
If you want to specify more than one device, use
export CUDA_VISIBLE_DEVICES=0,1
or
CUDA_VISIBLE_DEVICES=0,1 ./cuda_executable