Causes every thread in the block to wait until they have reached this point. Wor...

xrd · on March 16, 2024

Any CUDA primer you recommend in particular? I had this same question.

winwang · on March 16, 2024

Here's an article on syncing in CUDA via cooperative groups: https://developer.nvidia.com/blog/cooperative-groups/

There's also explicit warp synchronization, i.e. __syncwarp(). More on warp primitives here: https://developer.nvidia.com/blog/using-cuda-warp-level-prim...

cavisne · on March 16, 2024

Probably https://www.youtube.com/watch?v=nOxKexn3iBo (or just skimming the attached colab).

xrd · on March 16, 2024

This is terrific, thanks!