Multiple addresses map to same memory bank
Accesses are serialized
Hardware splits request into as many separate conflict-free requests as necessary
Exception: if all access the same address: broadcast
However, recent large improvements in CUBLAS and CUFFT performance were
achieved by avoiding shared memory in favor of registers -- so try to
use registers whenever possible.
If all threads read from the same shared memory address then a broadcast mechanism is automatically invoked and serialization is avoided. Shared memory broadcasts are an excellent and high-performance way to get data to many threads simultaneously.
It is worthwhile trying to exploit this feature whenever you use shared memory.
----Rob Farber
www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/208801731
CUDA, Supercomputing for the masses: Part 5
No comments:
Post a Comment