Yes yes yes more GPU programming videos!! Fantastic! Memory coalescence is is one of the magic tricks that make GPU software lightning fast. When I first experienced this ~4x speedup for a one-line change it blew me away. Unfortunately for many GPU kernels the optimization mostly ends here at the global memory bandwidth limit. Only special cases like matrix multiply or n-body can get another 10x from shared/local memory, and beyond there is still warp operations through inline assembly. Looking forward to next episode!
Thanks a lot 😀…. Next video is on tiling! I’m more excited to share that one. When I first learned tiling (~3 years ago), it was confusing and took me a long time to get the hang of it. I’ve always felt that HPC concepts go well with animations so I’m trying to do that using this channel.
There's a small typo. I forgot to change the A inside the function to d_A. Thanks a lot for catching that, I completely missed that while writing the animation code. However, in the code repo, it's correct.
anther day another banger
Thanks 😃
You sir are the 3b1b of GPU programming!
Thanks a lot, glad you found it useful 😃
@@0mean1sigma Keep it up! I remember watching 3b1b with similar popularity as your videos!
Yes yes yes more GPU programming videos!! Fantastic!
Memory coalescence is is one of the magic tricks that make GPU software lightning fast. When I first experienced this ~4x speedup for a one-line change it blew me away. Unfortunately for many GPU kernels the optimization mostly ends here at the global memory bandwidth limit.
Only special cases like matrix multiply or n-body can get another 10x from shared/local memory, and beyond there is still warp operations through inline assembly. Looking forward to next episode!
Thanks a lot 😀…. Next video is on tiling! I’m more excited to share that one. When I first learned tiling (~3 years ago), it was confusing and took me a long time to get the hang of it. I’ve always felt that HPC concepts go well with animations so I’m trying to do that using this channel.
Super
High quality content. Thank you.
Great video!
Glad you found it useful 😃
Thank you
nice video
I’m super curious why, in this code, you never any of the function parameters and you use variables that aren’t declared in the function.
There's a small typo. I forgot to change the A inside the function to d_A. Thanks a lot for catching that, I completely missed that while writing the animation code. However, in the code repo, it's correct.