Shakes Me Up: Bank Conflict and Warp Serializing

One can use the warp_serialize flag when profiling CUDA applications to determine whether shared memory bank conflicts occur in any kernel. In general, this flag also reflects use of atomics and constant memory.

NVIDIA CUDA Example for Matrix Transpose

http://docs.nvidia.com/cuda/samples/6_Advanced/transpose/doc/MatrixTranspose.pdf