Monday, May 7, 2012

How to measure time in NVIDIA CUDA?

Ivan's blog
http://ivanlife.wordpress.com/2011/05/09/time-cuda/

CUDA Developer Forum Discussion:
cudaEvent timer vs. Host timers
http://forums.developer.nvidia.com/devforum/discussion/7541/cudaevent-timers-vs-host-timers/p1
Parallel Nsight, NVIDIA Visual Profiler, CUDA profiler, and the CUPTI SDK provide the most accurate  methods to measure the execution time of a kernel. The measured time does not include the overhead to launch the kernel.

cudaEventRecord is the most accurate method to measure the setup and execution time of a kernel.

A high percision CPU timer can be used to measure the overhead of the launch, the execution of the kernel, and the completion notification. If you use this method I recommend that you call cudaDeviceSynchronize() before the first clock to make sure there is no outstanding work that might delay the launch of the kernel. This method will have the highest variance as OS context switching and other applications using the GPU will show up in this method.

Greg Simth

No comments:

Post a Comment