Rasa Algorithm Whiteboard - Transformers & Attention 2: Keys, Values, Queries

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ก.พ. 2025

ความคิดเห็น • 194

  • @pedropereirapt
    @pedropereirapt 4 ปีที่แล้ว +145

    I hereby declare this saga to be the best on Attention!
    Thank you so much for sharing the knowledge with such clarity!

    • @paulah1639
      @paulah1639 4 ปีที่แล้ว +1

      Hear hear!

    • @Kanasta7
      @Kanasta7 2 ปีที่แล้ว +1

      I agree. Your teaching skills are superb! Thank you so much

  • @BAHTYS
    @BAHTYS 3 ปีที่แล้ว +44

    Finally! Finally, someone managed to explain it to me in reasonable detail and extremely clear! Thank you!

  • @ahmeterdonmez9195
    @ahmeterdonmez9195 4 หลายเดือนก่อน +5

    90% of those who talk about this field do not actually know the subjects, they just tell what they have seen from somewhere.
    It is not difficult to understand this if you are an experienced learner.
    For the first time, I came across someone who explained the subject with real knowledge. Thank you endlessly.

  • @itzikgutzcha4779
    @itzikgutzcha4779 3 ปีที่แล้ว +25

    Dear Rasa,
    I have watched at least 10 videos explaining this subject, and only now, after seeing your explanation, I think that I finally understand.
    You are a great teacher, thank you!

  • @kdubovetskyi
    @kdubovetskyi 2 ปีที่แล้ว +9

    The single explanation which actually shows the motivation behind Q/K/V. Thanks.

  • @cliffordino
    @cliffordino 3 หลายเดือนก่อน

    I've been looking all day for a clear explanation of the K, V, and Q concepts. Finally found it. Nicely done!

  • @renatoviolin
    @renatoviolin 4 ปีที่แล้ว +34

    Incredible explanation! Please, keep going this kind of illustrative explanations. Congrats.

  • @amritbhattarai5083
    @amritbhattarai5083 2 ปีที่แล้ว +4

    if only the papers included the information about what the query, keys and the values actually mean! As a beginner at this field, I was having such a difficult time to wrap my head around what those actually mean. I tried convincing myself that I understood that using query and keys, we get values, but today is the day that I actually understood. Hats off sir, I want to thank you for this great explanation. The world needs teachers like you :)

  • @joliver1981
    @joliver1981 4 ปีที่แล้ว +6

    I cannot stress how great of a teacher you are. Amazing work. Truly. I feel as though you are literally walking my stupid brain through the subject matter.

    • @RasaHQ
      @RasaHQ  4 ปีที่แล้ว +1

      (Vincent here) Kind words. Thanks!

  • @MrChilledstep
    @MrChilledstep 4 ปีที่แล้ว +15

    Extremely clear. You are a brilliant teacher. Thank you!

  • @briancase6180
    @briancase6180 ปีที่แล้ว +3

    I'm pretty impressed. I've suffered through other explanations (including from noted professors), so I've learned some things first sure. What's *excellent* about your explanations is that they are easier to follow, very well motivated (I see why we want each structure), and are bite-sized so it's not too much at once. Great job.

  • @jithinmukundan9016
    @jithinmukundan9016 ปีที่แล้ว +1

    Excellent explanation. The only video among dozens I have seen that actually explains the 'why' and 'how' of using query, key and value in attention mechanism. So lucid and concise. Thank you so much.

  • @muhannadobeidat
    @muhannadobeidat 2 ปีที่แล้ว

    If you did not understand the explanation here, just rinse and repeat and you eventually will. This is the best explanation of keys, values and queries. Just pay attention!
    Thanks for putting this together!

  • @noahgsolomon
    @noahgsolomon ปีที่แล้ว

    this concept has been explained so abstractly in other videos. Props for making it intuitive, and going through the process instead of explaining the final derivation

  • @saintcodded2918
    @saintcodded2918 ปีที่แล้ว

    Truly ATTENTION is all you need to understand this piece. Great work👍

  • @GauravBbbb
    @GauravBbbb 2 ปีที่แล้ว

    This video is ALL YOU NEED to know to understand the concepts in self attention. Best explanation so far about Q, K, V terms!!!!!!

  • @DanOblinger
    @DanOblinger 2 ปีที่แล้ว

    JUST WOW !!! That was **SO** lucid. I understand the details of the math & the deeper intuitions all in one go. AND my brain was not over-heating with the effort.

  • @deudaux
    @deudaux ปีที่แล้ว

    The clarity of this video is just on whole 'nother level!

  • @osuregraz
    @osuregraz 3 ปีที่แล้ว +1

    This is the best video. I’ve been searching for days because I don’t understand where do the key query and value come from (most other videos just talk about how the attention work but ignore how the KQVs are generated). This video solves my issue. Thumb-up!

    • @geekyprogrammer4831
      @geekyprogrammer4831 ปีที่แล้ว

      yeah because Q/K/V are weights just like what we learned in ANN at beginning

  • @lenhardreuter2254
    @lenhardreuter2254 2 ปีที่แล้ว +1

    Best video on attention layer there is! And the ONLY one that explains it, so that it‘s understood! Awesome work! Thank you!!

  • @ns-teamtv8888
    @ns-teamtv8888 11 หลายเดือนก่อน

    I am teaching in university self attention, this video helps me a lot to find the way to explain this concept
    Thanks a lot for this fantastic work with Rasa and this series of videos!!
    A master piece

  • @qjrmsktso2
    @qjrmsktso2 4 ปีที่แล้ว +7

    The best explanation about self-attention ive ever heard!! Thx a lot

  • @xphn1985
    @xphn1985 2 ปีที่แล้ว

    Brillant lesson! For the first time, I get to understand how the analogy of key, query and value come forth. Thank you!

  • @azurewang
    @azurewang 4 ปีที่แล้ว

    the way you explain it is as elegant as the design of slef attention

  • @SaxonBerryVideos
    @SaxonBerryVideos ปีที่แล้ว

    This is a fantastic video. Super intuitive storyline to unveil the concepts. Really easy to follow. If I knew someone who was trying to learn transformers I would share this with them straight away.

  • @297339003
    @297339003 4 ปีที่แล้ว

    I spent 2 days trying to learn attention layers in transformers. Didn't make any sense until I watched this! TYSM!!!!

  • @AI_Life_Journey
    @AI_Life_Journey 3 ปีที่แล้ว

    Now the trilogy of Query-Keys-Values (the penny) dropped, thank you sir

  • @airepublic9864
    @airepublic9864 2 ปีที่แล้ว

    Your description developed deep understanding&knowledge of how attentions works,
    Thx for earliest...

  • @lettry5297
    @lettry5297 3 ปีที่แล้ว

    I am speechless after this high quality of explanation

  • @Michael-yu9ix
    @Michael-yu9ix 2 ปีที่แล้ว

    Phenomenal explanation. I've watch a few other videos that left me more confused, but this finally made me understand what the purpose of the key, value, query weight matrices is. To be honest, I'm not sure whether some of the youtubers that make videos on this topic actually understand the purpose of the matrices fully.
    This whole series is so extremely good and helps so many beginners to better understand these models. Cannot thank you enough for making these videos. A lot of people as you can see from the comments are extremely thankful for providing such clear nicely illustrated explanations. Please keep making more videos. We all appreciate it a lot!

  • @binhu8128
    @binhu8128 ปีที่แล้ว

    Kudos to the instructor. Very clear explanation.

  • @JohnCena12355
    @JohnCena12355 3 ปีที่แล้ว

    This is by far the best explaination of Attention I have seen. Thank you!

  • @shantanunath7927
    @shantanunath7927 ปีที่แล้ว

    The best illustration I have ever seen.. It changed my views

  • @Deshwal.mahesh
    @Deshwal.mahesh 4 ปีที่แล้ว

    Seriously THE BEST videos I've ever seen on Attention and Self-Attention. Really Loved it

  • @sadeghmohammadi5567
    @sadeghmohammadi5567 3 ปีที่แล้ว +1

    You are rock! amazing, I do not know how long it takes you to get a context so clearly. but, It would take me , maybe reading many articles and finding connection maybe around 2-3 weeks!

  • @leoyao1994
    @leoyao1994 4 ปีที่แล้ว +1

    This is incredible..... so complicated but so clear.... AMAZING WORK!!

  • @christopherjamesyoung7766
    @christopherjamesyoung7766 3 ปีที่แล้ว +1

    construction of attention from concept to implementation. excellent job

  • @edvinbeqari7551
    @edvinbeqari7551 8 หลายเดือนก่อน

    This is the best explanation I have seen thus far. Thank you and subscribed.

  • @markj.carlebach8224
    @markj.carlebach8224 3 หลายเดือนก่อน

    This is great explanation of KQV in attention and transformer logic.

  • @hareshwedanayake7427
    @hareshwedanayake7427 7 หลายเดือนก่อน

    Brilliant video. I was quite lost on this topic before watching this

  • @carlosgruss7289
    @carlosgruss7289 ปีที่แล้ว

    What a brilliant series of videos. Thank you!

  • @nayanvats3424
    @nayanvats3424 3 ปีที่แล้ว +1

    Too good, a simple topic explained with elegance. Cheers!

  • @Rotem_shwartz
    @Rotem_shwartz ปีที่แล้ว

    I finally got it !! After so many videos, thank you !!!❤

  • @bikashshrestha1958
    @bikashshrestha1958 3 ปีที่แล้ว

    The Best Explanation on Attention PERIOD!!!

  • @justinwhite2725
    @justinwhite2725 3 ปีที่แล้ว +1

    3:00 thank you for this. Coming at this from a programming background and not a math background people talk about a 'dot product' and I'm here going 'I have no idea what that means'.
    Yes I've googled it, and I've never been sure I was doing it correctly (I wasn't) before seeing this explanation.
    4:50 thank you! Other explanations were never clear if I were multiplying everything to get once number or if I were just multiplying out the array to get a new array.

  • @prashanths8536
    @prashanths8536 3 ปีที่แล้ว

    Very good explanation especially with vector representation. Turns out I have heard this voice on other video tutorials too which I appreciate.
    Thank you!

  • @sebastianp4023
    @sebastianp4023 3 ปีที่แล้ว

    You Sir, will be cited in my Master's Thesis if i end up using attention.

  • @sahar2003
    @sahar2003 4 ปีที่แล้ว +1

    best attention explanation on the web! THANK U!

  • @AnnaGrace5
    @AnnaGrace5 ปีที่แล้ว

    Absolutely amazing. Makes SO MUCH MORE SENSE now. Thank-you :)

  • @xv0047
    @xv0047 2 ปีที่แล้ว

    This is an absolute home run. Well done!

  • @yusufani8
    @yusufani8 2 ปีที่แล้ว

    I am addicted to this video whenever I forget attentions mechanism I remember picture of 09:07

  • @aliyoussef97
    @aliyoussef97 ปีที่แล้ว

    The best explanation I watched so far

  • @alvarozamora2679
    @alvarozamora2679 4 ปีที่แล้ว

    I've done quite a lot of work on neural networks, but none on NLP. I was having trouble understanding these attention blocks but this is super clear. good stuff.

  • @abc-by1kb
    @abc-by1kb 3 ปีที่แล้ว

    OmG I can't believe I didn't discover this earlier. What a great video. Really love your way of explaining things.

  • @eyorokon
    @eyorokon 3 ปีที่แล้ว

    by far the best explanation on self attention. ty

  • @neurochannels
    @neurochannels 5 หลายเดือนก่อน +1

    This is amazing.

    • @neurochannels
      @neurochannels หลายเดือนก่อน

      Just came back to agree with myself, it is not only amazing, but impressive this was out before the ChatGPT craze was here. I keep coming back to this video when I need to remind myself of how everything works.

  • @SyntharaPrime
    @SyntharaPrime 2 ปีที่แล้ว

    it is incredibly wonderful explanation for this subject. thanks so much. It was a great chance to see this video. really thanks a lot

  • @abhi8569
    @abhi8569 4 ปีที่แล้ว +1

    The best explanation.

  • @irlporygon-z6929
    @irlporygon-z6929 ปีที่แล้ว

    oh my god. there are SO many videos on this subject that fail to explain almost anything, probably because the presenter has no comprehension of the topic. i have been looking for this for a couple weeks now lol. just a video that states clearly "this is a vector, the output of this operation is a number" et cetera and clearly explains exactly WHAT everything is instead of vague nonsense analogies about what the pieces are. Also I have to say your handwriting/drafting skills are fantastic

  • @baranyildirim668
    @baranyildirim668 3 ปีที่แล้ว +1

    This really is the best one out there. Thank you so much

  • @ananosnasos5043
    @ananosnasos5043 2 ปีที่แล้ว

    finally, a great and simple explanation, thank you very much

  • @ishgirwan
    @ishgirwan 3 ปีที่แล้ว

    Woah...finally it all makes sense to me. Best explanation of self attention I have watched. Thanks :)

  • @junowhut7486
    @junowhut7486 3 ปีที่แล้ว

    yup, easily the best explanantion of attention. thank you for sharing!

  • @pranjalchaubey
    @pranjalchaubey 4 ปีที่แล้ว

    Coolest explanation of Keys, Values and Queries!

  • @purushotamradadia8175
    @purushotamradadia8175 3 ปีที่แล้ว +1

    Great explanation ever heard on query key and value.

  • @yevhendiachenko3703
    @yevhendiachenko3703 3 ปีที่แล้ว +1

    Explanation is brilliant!

  • @techaztech2335
    @techaztech2335 3 ปีที่แล้ว

    this is a mind bogglingly awesome explanation....

  • @jaeboumkim1213
    @jaeboumkim1213 4 ปีที่แล้ว

    Great!!! It's the best video for explanation on Attention!

  • @obafemijinadu4726
    @obafemijinadu4726 ปีที่แล้ว

    Finally! attention makes all the sense in the world.

  • @AkshatSharma-qx9wh
    @AkshatSharma-qx9wh 3 ปีที่แล้ว

    The best explanation!! You are the best .. Kudos !

  • @TaylorSparks
    @TaylorSparks 2 ปีที่แล้ว

    great video. Helps explain Q,K,V in attention

  • @antarikshshreshthi
    @antarikshshreshthi 5 หลายเดือนก่อน +1

    What a beautiful video!

  • @rayaay3095
    @rayaay3095 3 ปีที่แล้ว

    I finally understand the attention mechanism. Thank you so much

  • @fiv1067
    @fiv1067 4 ปีที่แล้ว +1

    Wonder how this video has only 4k views. Excellent explanation :(

  • @DeepFindr
    @DeepFindr 3 ปีที่แล้ว +1

    Wow! Great videos!

  • @RezoanurRahman
    @RezoanurRahman 2 ปีที่แล้ว

    Bro...when you put the analogy of query key and value that was like watching a movie and figuring out the plot twist.

  • @SubhamKumar-eg1pw
    @SubhamKumar-eg1pw 4 ปีที่แล้ว +1

    Amazing explanation!

  • @krishanudasbaksi9530
    @krishanudasbaksi9530 4 ปีที่แล้ว

    What an awesome explanation !!! Thank you Rasa very much for making this... :D

  • @theapplecrumble
    @theapplecrumble 3 ปีที่แล้ว +1

    Thanks this is tremendously helpful!

  • @maxwinmax
    @maxwinmax 3 ปีที่แล้ว +1

    Perfect explanation. This helped a lot, thanks !

  • @ernestkirstein6233
    @ernestkirstein6233 3 ปีที่แล้ว +1

    At 8:37 the unnormalized weights should be equal to (MqV)^T(MkV) where V is the matrix of encoded column vectors, if I'm not mistaken. But (MqV)^T=V^TMq^T so that whole operation is equal to V^T(Mq^TMk)V. Since Mq^TMk becomes a single matrix entirely composed of learned weights, why in the world would we need two separate weight matrices? Am I parsing that wrong, is that an error in the video, or am I having a brain fart?

    • @rabailkamboh8857
      @rabailkamboh8857 2 ปีที่แล้ว

      hey !! did you get the answer?

    • @ernestkirstein6233
      @ernestkirstein6233 2 ปีที่แล้ว

      @@rabailkamboh8857 There are a couple of assumed bias terms in there that I wasn't accounting for. Any weight multiplication should also have a bias term added on, but apparently it's standard practice to leave that out in most notation in ML circles.

  • @felipefrigeri9787
    @felipefrigeri9787 4 ปีที่แล้ว

    Well, this got me attent!
    Thank you so much for the explanation, I was very confused about the V, Q, K matrices!

  • @LuisPerez-rr9jc
    @LuisPerez-rr9jc ปีที่แล้ว

    That was incredibly helpful, thanks!

  • @sannalun845
    @sannalun845 3 ปีที่แล้ว

    I'd like this video 1000 times if I could.

  • @tantzer6113
    @tantzer6113 2 ปีที่แล้ว +1

    Excellent! More please!

  • @310gowthamsagar5
    @310gowthamsagar5 ปีที่แล้ว

    oh my goodness !! you explanation is awesome.

  • @maiabboud9728
    @maiabboud9728 3 ปีที่แล้ว +1

    Thanks, eventually I get why values and keys are different

  • @mohanapalaka
    @mohanapalaka 4 ปีที่แล้ว

    Most intuitive explanation I found! Thank you :D

  • @sebastianp4023
    @sebastianp4023 3 ปีที่แล้ว

    more likes! this is good stuff! (i mean the quality of the explanation)

  • @tanphan3970
    @tanphan3970 2 ปีที่แล้ว +1

    Excellent! A question, why do you do twice self-attention to become NER? Really what are the differences between the output of the first and second self-attention?

  • @katrinb8297
    @katrinb8297 4 ปีที่แล้ว +1

    Thank you, your explanation helped a lot!

  • @aashwinsharma1859
    @aashwinsharma1859 2 ปีที่แล้ว

    Best video on attention. Thnx a lot

  • @dhoomketu731
    @dhoomketu731 4 ปีที่แล้ว

    Brilliant explanation. Loved it.

  • @suchandrabhattacharyya5263
    @suchandrabhattacharyya5263 ปีที่แล้ว

    What an amazing lecture

  • @rothenbergerrobert
    @rothenbergerrobert 7 หลายเดือนก่อน

    incredible explanation. Thank you

  • @TrevorHigbee
    @TrevorHigbee 8 หลายเดือนก่อน

    This (and video 1) is so genius.

  • @newwaylw
    @newwaylw 3 หลายเดือนก่อน

    Around 6:05 you mentioned (by introducing weights) "maybe we can learn more patterns". You did not mention why or how using weights are helpful. One of the reasons is that without introducing weights and we stick to the dot product, using your example, the product between v_x and v_x will always be the highest as they are the same token, and the dot product between two tokens are always pre-determined by your pre-trained word embeddings and are fixed (e.g. the embedding for 'bank' is fixed and will not change). By introducing weights, you are allowing the network to learn/update the inter-token relations by seeing training examples.

  • @BiranchiNarayanNayak
    @BiranchiNarayanNayak 4 ปีที่แล้ว

    Excellent tutorial on Attention

  • @galenw6833
    @galenw6833 ปีที่แล้ว

    These videos are excellent, and put the original paper and many webpages to shame.
    "These videos are all you need." (see what I did there? :-P)
    'Attention' basically means: re-projecting each token from their original semantic embedding (word2vec, glove, etc.) into a new basis which is the relative similarity in meaning to other tokens in the sentence (sequence).
    So it should really be called "relative meaning space projection is all you need".
    Not as catchy, but perhaps clear.
    Query, key and value mean weight matrices for each time the token embeddings are used (twice in the dot product with itself to obtain the weights, and once for the input embeddings). This allows better re-embeddings (re-projections in terms of other vectors in sequence) to be learned.

  • @Karolina-vi2wt
    @Karolina-vi2wt 4 ปีที่แล้ว

    Thank you so much! There's just one thing I didn`t get: At 6:00 you say there are no weights involved in the Attention block. But what about the w's on the left side, the normalised scores? Those are weights, no?

    • @RasaHQ
      @RasaHQ  4 ปีที่แล้ว

      (Vincent here)
      I understand the confusion in terminology. Those "weights" are normalization weights. Those values are not learned/updated by gradient descent. In later videos, we will add layers that will, which is the distinction.

    • @Karolina-vi2wt
      @Karolina-vi2wt 4 ปีที่แล้ว

      @@RasaHQ Ooooh yes, got it! Thank you for answering and please enjoy the rest of your Sunday ♥️