Cross Attention means you want to find the relevance between different objects inputs, so basically Q and K must come from different source, and K and V must come from same source because K and V are representation of same object with different scope.
Thanks for sharing this Sayak Paul . As always , amazing work in covering everything in so much detail.
Thanks for sharing!
Thanks for sharing! It was very interesting!
what a good presentation!
Amazing! Congratulations.
This is fantastic! Congratulations Sayak da! 🎉
Thanks for sharing
I have a doubt, at 19:41 (From Self-Attention to Cross-Attention) slide, at the bottom, shouldn't we group Q and K for Text and V for Image?
Cross Attention means you want to find the relevance between different objects inputs, so basically Q and K must come from different source, and K and V must come from same source because K and V are representation of same object with different scope.
Congrats sir
congrats !
Is there a small mistake at 53:09?
Shouldn't it be 0.2+0.05+0.04 = 0.31 instead of 0.11?
1:43:16
Sayak is the backbone of half the global ML community