Sunday, October 28, 2012

CUDA Shared Memory broadcast

Multiple addresses map to same memory bank

Accesses are serialized
Hardware splits request into as many separate conflict-free requests as necessary
Exception: if all access the same address: broadcast

However, recent large improvements in CUBLAS and CUFFT performance were achieved by avoiding shared memory in favor of registers -- so try to use registers whenever possible.

If all threads read from the same shared memory address then a broadcast mechanism is automatically invoked and serialization is avoided. Shared memory broadcasts are an excellent and high-performance way to get data to many threads simultaneously.

It is worthwhile trying to exploit this feature whenever you use shared memory.

----Rob Farber
www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/208801731

CUDA, Supercomputing for the masses: Part 5 

No comments:

Post a Comment