Does __syncthreads() synchronize all threads in the grid?

The __syncthreads() command is a block level synchronization barrier. That means it is safe to be used when all threads in a block reach the barrier. It is also possible to use __syncthreads() in conditional code but only when all threads evaluate identically such code otherwise the execution is likely to hang or produce unintended side effects [4]. Example of using __syncthreads(): (source) … Read more

When to call cudaDeviceSynchronize?

Although CUDA kernel launches are asynchronous, all GPU-related tasks placed in one stream (which is the default behavior) are executed sequentially. So, for example, So in your example, there is no need for cudaDeviceSynchronize. However, it might be useful for debugging to detect which of your kernel has caused an error (if there is any). cudaDeviceSynchronize may … Read more

Is it possible to run CUDA on AMD GPUs?

Nope, you can’t use CUDA for that. CUDA is limited to NVIDIA hardware. OpenCL would be the best alternative. Khronos itself has a list of resources. As does the StreamComputing.eu website. For your AMD specific resources, you might want to have a look at AMD’s APP SDK page. Note that at this time there are several initiatives to translate/cross-compile CUDA … Read more

Use of cudamalloc(). Why the double pointer?

All CUDA API functions return an error code (or cudaSuccess if no error occured). All other parameters are passed by reference. However, in plain C you cannot have references, that’s why you have to pass an address of the variable that you want the return information to be stored. Since you are returning a pointer, … Read more

How to verify CuDNN installation?

Installing CuDNN just involves placing the files in the CUDA directory. If you have specified the routes and the CuDNN option correctly while installing caffe it will be compiled with CuDNN. You can check that using cmake. Create a directory caffe/build and run cmake .. from there. If the configuration is correct you will see these lines: If everything is … Read more

Error compiling CUDA from Command Prompt

You will need to add the folder containing the “cl.exe” file to your path environment variable. For example: Edit: Ok, go to My Computer -> Properties -> Advanced System Settings -> Environment Variables. Here look for “PATH” in the list, and add the path above (or whatever is the location of your cl.exe).

How do I select which GPU to run a job on?

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly. To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using or The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation. If you want to specify more than one device, use … Read more

NVIDIA NVML Driver/library version mismatch

Surprise surprise, rebooting solved the issue (I thought I had already tried that). The solution Robert Crovella mentioned in the comments may also be useful to someone else, since it’s pretty similar to what I did to solve the issue the first time I had it.

Cudamemcpy function usage

It’s not trivial to handle a doubly-subscripted C array when copying data between host and device. For the most part, cudaMemcpy (including cudaMemcpy2D) expect an ordinary pointer for source and destination, not a pointer-to-pointer. The simplest approach (I think) is to “flatten” the 2D arrays, both on host and device, and use index arithmetic to simulate 2D coordinates: … Read more