what would be the advantage of this methods vs Flash attention. Flash attention speeds up the computation and it is an exact computation most of these methods are approximations. I would like if possible to see a video explaining other attention types as Paged attention and Flash Attention. Great content :)
Thank you for the suggestion! You're absolutely right. In this video, I focused on purely algorithmic approaches, not hardware-based solutions like FlashAttention. FlashAttention is an IO-aware exact attention algorithm that uses tiling to reduce memory reads/writes between GPU memory levels, which results in significant speedup without sacrificing model quality. I appreciate your input and will definitely consider making a video to explain FlashAttention!
Thanks for the suggestion, I made a new video on Flash Attention: FlashAttention: Accelerate LLM training th-cam.com/video/LKwyHWYEIMQ/w-d-xo.html I would love to hear your comments and if you have any other suggestions
Thanks for the suggestion, yes I agree. I have briefly described axial attention in the vision transformer series th-cam.com/video/bavfa_Rr2f4/w-d-xo.htmlsi=0SB9Yc_0SasafhJN
Very informative. Thank you!
Glad it was helpful!
good explanation, very clear
Thank you for the nice comment! Glad you find the videos useful!
Thank you so much
what would be the advantage of this methods vs Flash attention. Flash attention speeds up the computation and it is an exact computation most of these methods are approximations. I would like if possible to see a video explaining other attention types as Paged attention and Flash Attention. Great content :)
Thank you for the suggestion! You're absolutely right. In this video, I focused on purely algorithmic approaches, not hardware-based solutions like FlashAttention. FlashAttention is an IO-aware exact attention algorithm that uses tiling to reduce memory reads/writes between GPU memory levels, which results in significant speedup without sacrificing model quality.
I appreciate your input and will definitely consider making a video to explain FlashAttention!
Thanks for the suggestion, I made a new video on Flash Attention:
FlashAttention: Accelerate LLM training
th-cam.com/video/LKwyHWYEIMQ/w-d-xo.html
I would love to hear your comments and if you have any other suggestions
you should include axial attention and axial position embedding, its simple yet work great on image, and video.
Thanks for the suggestion, yes I agree. I have briefly described axial attention in the vision transformer series
th-cam.com/video/bavfa_Rr2f4/w-d-xo.htmlsi=0SB9Yc_0SasafhJN
@@PyMLstudio thats awesome, thanks you!