When to call cudaDeviceSynchronize?
Although CUDA kernel launches are asynchronous, all GPU-related tasks placed in one stream (which is the default behavior) are executed sequentially. So, for example, So in your example, there is no need for cudaDeviceSynchronize. However, it might be useful for debugging to detect which of your kernel has caused an error (if there is any). cudaDeviceSynchronize may … Read more