One can use the warp_serialize flag when profiling CUDA applications to determine whether shared memory bank conflicts occur in any kernel. In general, this flag also reflects use of atomics and constant memory.
NVIDIA CUDA Example for Matrix Transpose
http://docs.nvidia.com/cuda/samples/6_Advanced/transpose/doc/MatrixTranspose.pdf
No comments:
Post a Comment