Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention

Hedu AI by Batool Haider

มุมมอง 161 184

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 28 พ.ค. 2024
Visual Guide to Transformer Neural Networks (Series) - Step by Step Intuitive Explanation
Episode 0 - [OPTIONAL] The Neuroscience of "Attention"
• The Neuroscience of “A...
Episode 1 - Position Embeddings
• Visual Guide to Transf...
Episode 2 - Multi-Head & Self-Attention
• Visual Guide to Transf...
Episode 3 - Decoder’s Masked Attention
• Visual Guide to Transf...
This video series explains the math, as well as the intuition behind the Transformer Neural Networks that were first introduced by the “Attention is All You Need” paper.
--------------------------------------------------------------
References and Other Great Resources
--------------------------------------------------------------
Attention is All You Need
arxiv.org/abs/1706.03762
Jay Alammar - The Illustrated Transformer
jalammar.github.io/illustrated...
The A.I Hacker - Illustrated Guide to Transformers Neural Networks: A step by step explanation
jalammar.github.io/illustrated...
Amirhoussein Kazemnejad Blog Post - Transformer Architecture: The Positional Encoding
kazemnejad.com/blog/transform...
Yannic Kilcher TH-cam Video - Attention is All You Need
www.youtube.com/watch?v=iDulh...

ความคิดเห็น • 612

@HeduAI 3 ปีที่แล้ว ⁺³⁴
*CORRECTIONS*
A big shoutout to the following awesome viewers for these 2 corrections:
1. @Henry Wang and @Holger Urbanek - At (10:28), "dk" is actually the hidden dimension of the Key matrix and not the sequence length. In the original paper (Attention is all you need), it is taken to be 512.
2. @JU PING NG
- The result of concatenation at (14:58) is supposed to be 7 x 9 instead of 21 x 3 (that is to so that the concatenation of z matrices happens horizontally and not vertically). With this we can apply a nn.Linear(9, 5) to get the final 7 x 5 shape.
Here are the timestamps associated with the concepts covered in this video:
0:00 - Recaps of Part 0 and 1
0:56 - Difference between Simple and Self-Attention
3:11 - Multi-Head Attention Layer - Query, Key and Value matrices
11:44 - Intuition for Multi-Head Attention Layer with Examples
@amortalbeing 2 ปีที่แล้ว ⁺²
Where's the first video?
@HeduAI ปีที่แล้ว ⁺⁴
@@amortalbeing Episode 0 can be found here - th-cam.com/video/48gBPL7aHJY/w-d-xo.html
@amortalbeing ปีที่แล้ว
@@HeduAI thanks a lot really appreciate it:)
@omkiranmalepati1645 ปีที่แล้ว
Awesome...So dk value is 3?
@jasonwheeler2986 ปีที่แล้ว ⁺¹
@@omkiranmalepati1645 d_k = embedding dimensions // number of heads
@thegigasurgeon ปีที่แล้ว ⁺¹⁵⁵
Need to say this out loud, I saw Yannic Kilcher's video, read tonnes of materials on internet, went through atleast 7 playlists, and this is the first time I really understood the inner mechanism of Q, K and V vectors in transformers. You did a great job here
@HeduAI ปีที่แล้ว ⁺⁸
This made my day :,)
@afsalmuhammed4239 10 หลายเดือนก่อน ⁺¹
True
@exciton007 8 หลายเดือนก่อน ⁺¹
Very intuitive explanation!
@EducationPersonal 7 หลายเดือนก่อน ⁺¹
Totally agree with this comment
@VitorMach 6 หลายเดือนก่อน ⁺¹
Yes, no other video actually explains what the actual input for these are
@nitroknocker14 2 ปีที่แล้ว ⁺²⁰¹
All 3 parts have been the best presentation I've ever seen of Transformers. Your step-by-step visualizations have filled in so many gaps left by other videos and blog posts. Thank you very much for creating this series.
@HeduAI 2 ปีที่แล้ว ⁺⁹
This comment made my day :,) Thanks!
@bryanbaek75 2 ปีที่แล้ว
Me, too!
@lessw2020 2 ปีที่แล้ว ⁺¹
Definitely agree. These videos really crystallize a lot of knowledge, thanks for making this series!
@Charmente2014 2 ปีที่แล้ว
ش
@devstuff2576 2 ปีที่แล้ว
@@HeduAI absolutely awesome . You are the best.
@nurjafri 3 ปีที่แล้ว ⁺⁷¹
Damn. This is exactly what a developer coming from other backgrounds need.
Simple analogies for a rapid understanding.
Thanks a ton.
Keep uploadinggggggggggg plss
@Xeneon341 3 ปีที่แล้ว ⁺¹
Agreed, very well done. You do a very good job of explaining difficult concepts to a non-industry developer (fyi I'm an accountant) without assuming a lot of prior knowledge. I look forward to your next video on masked decoders!!!
@HeduAI 3 ปีที่แล้ว ⁺⁴
@@Xeneon341 Oh nice! Glad you enjoyed these videos! :)
@ML-ok9nf 7 หลายเดือนก่อน ⁺⁶
Absolutely underrated, hands down one of the best explanations I've found on the internet
@rohtashbeniwal9202 ปีที่แล้ว ⁺⁴
this channel needs more love (the way she explains is out of the box). I can say this because I have 4 years of experience in data science, she did a lot of hard work to get so much clarity in concepts (love from India)
@HeduAI ปีที่แล้ว ⁺¹
Thank you Rohtash! You made my day! :) धन्यवाद
@mrkshsbwiwow3734 3 วันที่ผ่านมา
This is the best explanation of transformers on TH-cam.
@adscript4713 หลายเดือนก่อน ⁺¹
As someone NOT in the field reading the Attention paper, after having watched DOZENS of videos on the topic this is the FIRST explanation that laid it out in an intuitive manner without leaving anything out. I don't know your background, but you are definitely a great teacher. Thank you.
@HeduAI หลายเดือนก่อน
So glad to hear this :)
@chaitanyachhibba255 3 ปีที่แล้ว ⁺¹⁰
Were you the one who wrote transformers in the fist place, because no one explained it like you did. This is undoubtfully the best info I have seen. I hope you please keep posting more videos. Thanks a lot.
@HeduAI 3 ปีที่แล้ว ⁺¹
This comment made my day! :) Thank you.
@EducationPersonal 7 หลายเดือนก่อน ⁺¹
This is one of the best Transformer videos on TH-cam. I hope TH-cam always recommends this Value (V), aka video, as a first Key (K), aka Video Title, when someone uses the Query (Q) as "Transformer"!! 😄
@HeduAI 7 หลายเดือนก่อน
😄
@HuyLe-nn5ft 9 หลายเดือนก่อน ⁺⁵
The important detail that set you apart from the other videos and websites is that not only did you provide the model's architecture with numerous formulas but you also demonstrated them in vectors and matrixes, successfully walked us through each complicated and trivial concept. You really did a good job!
@rohanvaidya3238 3 ปีที่แล้ว ⁺¹⁰
Best explanation ever on Transformers !!!
@malekkamoua5968 2 ปีที่แล้ว ⁺¹¹
I've been stuck for so long trying to get the Transformer Neural Networks and this is by far the best explanation ! The examples are so fun making it easier to comprehend. Thank you so much for you effort !
@HeduAI 9 หลายเดือนก่อน
Cheers!
@forresthu6204 2 ปีที่แล้ว ⁺³
Self-attention is a villain that has struck me for a long time. Your presentation has helped me to better understand this genius idea.
@rishiraj8225 9 วันที่ผ่านมา
Coming back after a year, just to revise the basic concepts. It is still the best video on YT. Thanks Hedu AI
@krishnakumarprathipati7186 3 ปีที่แล้ว
The MOST MOST MOST MOST ..........................useful and THE BEST video ever on Multi head attention........Thanks a lot for your work
@HeduAI 3 ปีที่แล้ว
So glad you liked it! :)
@shubheshswain5480 3 ปีที่แล้ว ⁺¹
I went through many videos from Coursera, youtube, and some online blogs but none explained so clear about the Query, key, and values. You made my day.
@HeduAI 3 ปีที่แล้ว
Glad to hear this Shubhesh :)
@rayxi5334 ปีที่แล้ว ⁺¹
Better than the best Berkeley professor! Amazing!
@MGMG-li6lt 3 ปีที่แล้ว ⁺¹⁹
Finally! You delivered me from long nights of searching for good explanations about transformers! It was awesome! I can't wait to see the part 3 and beyond!
@HeduAI 3 ปีที่แล้ว ⁺¹
Thanks for this great feedback!
@HeduAI 3 ปีที่แล้ว ⁺²
“Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D
th-cam.com/video/gJ9kaJsE78k/w-d-xo.html
@sebastiangarciaacosta5468 3 ปีที่แล้ว ⁺¹⁵
The best explanation I've ever seen of such a powerful architecture. I'm glad of having found this Joy after searching for positional encoding details while implementing a Transformer from scratch today. Valar Morghulis!
@HeduAI 3 ปีที่แล้ว ⁺²
Valar Dohaeris my friend ;)
@jackziad 3 ปีที่แล้ว ⁺¹⁷
Your videos are so good at getting complex ideas across in an intuited way. You are like the 3Blue1Brown equivalent for AI. Keep it up and keep producing high-quality video content, at your own pace of course 😋
@HeduAI 3 ปีที่แล้ว ⁺⁸
3Blue1Brown is one of my favorite channels! Therefore, you comparing these videos to that channel is one of the best compliments ever. Thank you! :)
@rishiraj8225 10 หลายเดือนก่อน
@@HeduAI yes.. this is awesome explanation comparable to 3Blue1Brown.. make more..
@alankarmisra 7 หลายเดือนก่อน
3 days, 16 different videos, and your video "just made sense". You just earned a subscriber and a life-long well-wisher.
@ja100o ปีที่แล้ว ⁺¹
I'm currently reading a book about transformers and was scratching my head over the reason for the multi-headed attention architecture.
Thank you so much for the clearest explanation yet that finally gave me this satisfying 💡-moment
@andybrice2711 หลายเดือนก่อน
This really is an excellent explanation. I had some sense that self-attention layers acted like a table of relationships between tokens, but only now do I have more sense of how the Query, Key, and Value mechanism actually works.
@wireghost897 10 หลายเดือนก่อน
Finally a video on transformers that actually makes sense. Not a single lecture video from any of the reputed universities managed to cover the topic with such brilliant clarity.
@dominikburkert2824 3 ปีที่แล้ว ⁺¹
best transformer explanation on TH-cam!
@HeduAI 3 ปีที่แล้ว
So glad to hear this! :D
@kafaayari 2 ปีที่แล้ว
I won't say this is the best explanation so far, but this is the only explanation. Others are just repeating the original paper.
@nizamphoenix 7 หลายเดือนก่อน
Being a professional in this field for ~5years can say this is by far the best explanation of attention.
Amused as to why this doesn't pop up on YT's recommendation for attention at the top. Probably, YT's attention needs some attention to fix its Q, K, Vs
@HeduAI 7 หลายเดือนก่อน
You made my day :)
@madhu1987ful ปีที่แล้ว
Wow. Just wow !! This video needs to be in the top most position when searched for content on transformers and their explanation
@HeduAI ปีที่แล้ว ⁺¹
So glad to see this feedback! :)
@sujithkumar5415 ปีที่แล้ว
This is quite literally the best attention mechanism video out there guys
@devchoudhary8892 ปีที่แล้ว ⁺¹
best, best best explanation on transformer, you are adding so much value to the world.
@zhehanhuang4675 3 ปีที่แล้ว ⁺¹
really good intuition of self-attention and multi-attention
@HeduAI 3 ปีที่แล้ว
I am glad to hear that :)
@zhehanhuang4675 3 ปีที่แล้ว
@@HeduAI hi, thanks for your reply. When I read some papers, they mentioned ”attention map“, is that the same thing as ”attention filter“ mentioned in your video?
@persianform ปีที่แล้ว
The best explanation of attention models on the earth!
@Srednicki123 ปีที่แล้ว
I just repeat what everybody else said: these videos are the best! thank you for the effort
@frankietank8019 8 หลายเดือนก่อน ⁺¹
Hands down the best video on transformers I have seen! Thank you for taking your time to make this video.
@raunakdey3004 ปีที่แล้ว
Really love coming back to your videos and get a recap on multi layered attention and the transformers! Sometimes I need to make my own specialized attention layers for the dataset in question and sometimes i dunno it just helps to just listen to you talk about transformers and attention ! Really intuitive and helps me to break out of some weird loop of algorithm design I might have gotten myself stuck at. So thank you so so much :D
@sowmendas812 ปีที่แล้ว
This is literally the best explanation for self-attention I have seen anywhere! Really loved the videos!
@Scaryder92 ปีที่แล้ว
Amazing video, showing how the attention matrix is created and what values it assumes is really awesome. Thanks!
@binhle9475 ปีที่แล้ว ⁺¹
Your attention to details and information structuring are just exceptional. The Avatar and GoT references on top were hilarious and make things perfect. You literally made a story out of complex deep learning concept(s). This is just brillant.
You have such a beautiful mind (if you get the reference :D). Please consider making more videos like this, such a gift is truly precious. May the force be always with you. 🤘
@fernandonoronha5035 2 ปีที่แล้ว
I don't have words to describe how much these videos saved me, thank you!
@freaknextdoor9040 3 ปีที่แล้ว ⁺¹
Hands down, this series is the best one explaining the essence of transformers I have found online!!
Thanks a lot, you are awesome!!!!
@HeduAI 3 ปีที่แล้ว
Cheers! 🙌
@ghostvillage1 ปีที่แล้ว
Hands down the best series I've found on the web about transformers. Thank you
@maryamkhademi 2 ปีที่แล้ว
Thank you for putting so much effort in the visualization and awesome narration of these series. These are by far the best videos to explain transformers. You should do more of these videos. You certainly have a gift!
@HeduAI ปีที่แล้ว
Thank you for watching! Yep! Back on it :) Would love to hear which topic/model/algorithm are you most wanting to see on this channel. Will try to cover it in the upcoming videos.
@wolfie6175 2 ปีที่แล้ว
This is an absolute gem of a video.
@geetanshkalra8340 2 ปีที่แล้ว
This is by far the best video to understand Attention Networks. Awesome work !!
@aaryannakhat1842 2 ปีที่แล้ว
Spectacular explanation! This channel is sooo underrated!
@SuilujChannel ปีที่แล้ว
thanks for these great videos! The visualizations and extra explanations on details are perfect!
@jonathanlarkin1112 3 ปีที่แล้ว ⁺⁶
Excellent series. Looking forward to Part 3!
@HeduAI 3 ปีที่แล้ว ⁺¹
“Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D
th-cam.com/video/gJ9kaJsE78k/w-d-xo.html
@oliverhu1025 ปีที่แล้ว
Probably the best explanation of transformers I’ve found online. Read the paper, watched Yannic’s video, some paper reading videos and a few others, the intuition is still missing. This connects the dots, keep up the great work!
@skramturbo8499 ปีที่แล้ว
I really like the fact that you ask questions within the video. In fact those are the same questions one has and first reading about transformers. Keep up the awesome work!
@srikanthkarapanahalli ปีที่แล้ว
Awesome analogy and explanation !
@markpadley890 3 ปีที่แล้ว
Outstanding explanation and well delivered, both verbally and with the graphics. I look forward to the next in this series
@HeduAI 3 ปีที่แล้ว
“Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D
th-cam.com/video/gJ9kaJsE78k/w-d-xo.html
@oludhe7 หลายเดือนก่อน
Literally the best series on transformers. Even clearer than statquest and luis serrano who also make things very clear
@minruihu ปีที่แล้ว
it is impressive, you explain so complicated topics in a vivid and easy way!!!
@cihankatar7310 ปีที่แล้ว
This is the best explanation of transformers architecture with a lot of basic analogy ! Thanks a lot!
@adityaghosh8601 2 ปีที่แล้ว
Blown away by your explanation . You are a great teacher.
@McBobX 2 ปีที่แล้ว
That is what I'm looking for, for 3 days now! Thanks a lot!
@bhavyaghai1924 11 หลายเดือนก่อน
Educational + Entertaining. Nice examples and figures. Loved it!
@hubertkanyamahanga2782 8 หลายเดือนก่อน
I am just speechless, this is unbelievable! Bravo!
@kazeemkz 5 หลายเดือนก่อน
Spot on analysis. Many thanks for the clear explanation.
@cambostrongbo 3 ปีที่แล้ว ⁺⁵
Speaking my kind of language, all made sense when you started referencing avatar
@HeduAI 3 ปีที่แล้ว
Hahaha :D Glad to know....
@artukikemty ปีที่แล้ว
Thanks for posting, by far this is the most didactic Transformer presentation I've ever seen. AMAZING!
@onthelightway 2 ปีที่แล้ว
Incredibly well explained! Thanks a lot
@cw9249 ปีที่แล้ว
you are amazing. ive watched other videos and read materials but nothing compares to your videos
@robertco7 11 หลายเดือนก่อน
This is very clear and well-thought out, thanks!
@davidlazaro3143 10 หลายเดือนก่อน
This video is GOLD, it should be everywere! Thank you so much for doing such an amazing job 😍😍
@abdot604 ปีที่แล้ว
brilliant explanation , your chanel deserve way more ATTENTION.
@jirasakburanathawornsom1911 2 ปีที่แล้ว
Hand down the best transformer explanation. Thank you very much!
@Ariel-px7hz ปีที่แล้ว
Such a fantastic and detailed yet digestible explanation. As others have said in the comments, other explanations leave so many gaps. Thank you for this gem!
@user-ne2nr2yi1h 5 หลายเดือนก่อน
The best video I've ever seen for explaining transformer.
@RafidAslam 2 หลายเดือนก่อน
Thank you so much! This is by far the clearest explanation that I've ever seen on this topic
@SOFTWAREMASTER 9 หลายเดือนก่อน
Most underrated video about transformers. Going to recommend this to everyone. Thankyou
@shivam6565 11 หลายเดือนก่อน
Finally I understood the concept of query, key and value. Thank you.
@humayounkhan7946 ปีที่แล้ว ⁺²
That is the best explanation I've seen as a beginner in this field(I have been reading tons of articles and went through a lot of videos). Thanks a lot for your hard work. Because of my renewed understanding thanks to this video, I can now better explain this concept to others by borrowing your analogy, I hope you know that your video continues to bring compounding value to society even after 2 years of posting them. Thanks a lot Hedu!
@HeduAI ปีที่แล้ว
Thank you so much! Loved reading your comment.
@charlesgormley9075 ปีที่แล้ว
I agree, this is great for starting out with transformers
@clintcario6749 ปีที่แล้ว
These videos are really incredible. Thank you!
@pythondev2631 ปีที่แล้ว
The best video on multihead attention by far!
@adithyakaravadi8170 ปีที่แล้ว ⁺¹
You are so good, thank you for breaking down a seemingly scary topic for all of us.The original paper requires lot of background to understand clearly, and not all have it. I personally felt lost. Such videos help a lot!
@chenlim2165 ปีที่แล้ว
Bravo! After watching dozens of other explainer videos, I can finally grasp the reason for multi-headed attention. Excellent video. Please make more!
@danielarul2382 ปีที่แล้ว
One of the best explanations on Attention in my opinion.
@bharanij6130 ปีที่แล้ว
Hello! This is an incredible explanation of Self Attention! Thank you!
@giridharnr6742 ปีที่แล้ว
Its one of the best explainations of Transformers. Just mind blowing.
@ariasardari8588 2 ปีที่แล้ว ⁺¹
Your ability to convey concepts is quite impressive! Probably the best tutorial video I've ever seen.
From now on, every time I open TH-cam, I first check if you have a new video
It was fantastic!
I greatly appreciate it.
@HeduAI ปีที่แล้ว
Thanks a lot Aria! Really means a lot :)
@cracksomeface ปีที่แล้ว ⁺¹
I'm a grad student currently applying NLP - this is literally the best explanation of self-attention I have ever seen. Thank you so much for a great vid!
@jboyce007 5 หลายเดือนก่อน
If only I saw your videos earlier. As everyone in the comments says, these are THE BEST videos on the subject matter found anywhere! Thank you so very much for helping us all!
@HeduAI 5 หลายเดือนก่อน
Cheers! :)
@user-xn8wg6yw7g 2 วันที่ผ่านมา
Very good explanation. Thanks!
@Andrew6James 3 ปีที่แล้ว
Wow. Amazing explanation! You have a gift for explaining quite complex material succinctly.
@HeduAI ปีที่แล้ว
Thanks Andrew! Cheers! :D
@henrylouis5143 2 ปีที่แล้ว ⁺¹
Brilliant presentation, it's none other than the best I've seen. Great appreciation for your work!!! Cristal clear organization.
@HeduAI ปีที่แล้ว
Thanks Henry! Glad you liked it :)
@Abhi-qf7np 2 ปีที่แล้ว ⁺¹
You are the best😄😄, This is THE Best explanation I have ever seen on TH-cam for Transformer Model, Thank you so much for this video.
@adarshkone9384 10 หลายเดือนก่อน
have been trying to understand this topic for a long time , glad I found this video now
@kennethm.4998 2 ปีที่แล้ว
You have a gift for explanations... Best I've seen anywhere online. Superb.
@newbie8051 9 วันที่ผ่านมา
Ah this makes everything simple and make sense
Thanks for the easy to follow explanation !
@hewas321 ปีที่แล้ว
No way. This video is insane!! The most accurate and excellent explanation of self-attention mechanism. Subscribed to your channel!
@bendarodes61 2 ปีที่แล้ว
I've watched many video series about transformers, this is by far the best.
@jasonpeloquin9950 11 หลายเดือนก่อน
Hands down the best explanation of the use of Query, Key and Value matrices. Great video with an easy example to understand.
@franzanders7762 2 ปีที่แล้ว
I can't believe how good this is.
@PratikChatse 2 ปีที่แล้ว
Amazing !! loved the explanation! Subscribed
@mariosconstantinou8271 ปีที่แล้ว
These videos are amazing, thank you so much! Best explanation so far!!
@nicholasabad8361 ปีที่แล้ว
By fair the best explanation of Multi-Head Attention I've ever seen on TH-cam! Thanks!
@HeduAI ปีที่แล้ว
Glad to hear this :)
@simonren4890 2 ปีที่แล้ว
This is the best. It is simple and tight. Please do more papers in the future.
@bochengxiao1352 2 ปีที่แล้ว ⁺¹
Thank you so much! It's the best Transformer video ever! Really hope more on other models.
@HeduAI ปีที่แล้ว
Glad to hear that! :) Do let me know if there are certain models that you would like to see covered in future videos.

ต่อไป

เล่นอัตโนมัติ

Visual Guide to Transformer Neural Networks - (Episode 3) Decoder’s Masked Attention