The many amazing things about Self-Attention and why they work

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 พ.ย. 2024

ความคิดเห็น • 20

  • @avb_fj
    @avb_fj  11 หลายเดือนก่อน +2

    Part 1 - Neural Attention: th-cam.com/video/frosrL1CEhw/w-d-xo.html
    Part 3 - Transformers: th-cam.com/video/0P6-6KhBmZM/w-d-xo.html

  • @ControllerQuickSwaps
    @ControllerQuickSwaps ปีที่แล้ว +5

    Keep it up man, you're doing an amazing job. You have incredible production value.
    A small suggestion I'd say is to talk a tiny bit slower when discussing technical details. I totally understand the words you're saying, but since you use very information-rich language my brain can need a few more milliseconds to digest the meaning of each token. Not a huge issue though, you're doing great :)

    • @avb_fj
      @avb_fj  ปีที่แล้ว +1

      Noted! Thanks for the kind words! 🙌🏼

  • @sharannagarajan4089
    @sharannagarajan4089 9 หลายเดือนก่อน +1

    Hey, I would love to know more about the way you compare self attention just like a feedforward layer which projects on a different space. Do let me know what resource I can see more on it

  • @soojinlee6191
    @soojinlee6191 3 หลายเดือนก่อน

    Thank you so much for making videos on transformer! Your explanations are very intuitive, one of the best I've ever watched!

    • @avb_fj
      @avb_fj  3 หลายเดือนก่อน

      Wow, thanks! Glad you are enjoying the videos!

  • @saleemun8842
    @saleemun8842 11 หลายเดือนก่อน +1

    It explained very well and easy to follow. I learned a lot, great work man!

  • @willikappler1401
    @willikappler1401 ปีที่แล้ว +1

    Wonderful video, very well explained!

  • @shahriarshaon9418
    @shahriarshaon9418 ปีที่แล้ว +1

    Can't appreciate it enough. Recently I cracked one interview and your videos helped me a lot. I was searching for your mail, but couldn't find it. However, what I was gonna request you is if can you make tutorials by that I don't mean tutorials from scratch, but rather if you have any plans to make videos on a capstone project basis. Anyway, I can see your channel shining, all the best. Thanks again.
    😍

    • @avb_fj
      @avb_fj  ปีที่แล้ว +3

      That's awesome dude! Thanks for your kind words and congrats on the interview! I generally don't make tutorials, and I have a long list of video ideas waiting in the backlog (i have a full time job so hard to manage time to make comprehensive tutorials), but I definitely plan to get into the space eventually. If you have any specific project idea you want to see covered, let me know in the comments.

    • @shahriarshaon9418
      @shahriarshaon9418 ปีที่แล้ว

      @@avb_fj yeah sure. Actually i am going to work on reconstructing CT scan images from MRI images and also predicting treatment doses for glioma patients and i am planning to incorporate transformed based deep learning models. If you have any suggestions and guidance, it will be highly appreciated. Thanks again and all the best for your upcoming videos, hope to watch all of them.

  • @sharvani_0779
    @sharvani_0779 7 หลายเดือนก่อน +2

    Great content and simple explanation!

    • @avb_fj
      @avb_fj  7 หลายเดือนก่อน

      Glad you liked it!

  • @hieunguyentranchi947
    @hieunguyentranchi947 15 วันที่ผ่านมา

    WONDERFUL VIDEO!!! Btw do you have any resources that talk about the analogy of "WX+B" in Perceptron and "softmax(QK)V+X" in transformers, and how transformers is an adaptive learning framework? And what was the talk that you mentioned in this video?

    • @avb_fj
      @avb_fj  15 วันที่ผ่านมา

      I wish I could find which talk I learnt that in. I remember reading/watching it somewhere during my university days, and it sort of stuck with me. I had tried to find the resource back when I was working on this video, but unfortunately I couldn’t find it. There is surprisingly little resources online about this.
      Anyway here is a great paper that contains mathematical proofs of many important things about attention/transformers:
      arxiv.org/pdf/1912.10077

  • @saisaigraph1631
    @saisaigraph1631 17 วันที่ผ่านมา

    Bro Great... Thank you...

    • @avb_fj
      @avb_fj  17 วันที่ผ่านมา

      Glad you liked it!

  • @carloslfu
    @carloslfu 11 หลายเดือนก่อน +1

    Great content!

  • @Pokemon00158
    @Pokemon00158 ปีที่แล้ว +1

    I do not really understand, when you say Adaptative in the way that a dense/weight layer is "fixed", what does that mean in practice? The dense/weight layer is with the same logic also "adaptative" when it sees input data due to back propagation changing the values inside of it.

    • @avb_fj
      @avb_fj  ปีที่แล้ว +1

      Aha I see the point. Let me clarify.
      The adaptive nature I was referring to is when we are doing inferencing on an already trained neural network. Backprop is done only when we are training the network, but once they are trained, the weights of dense layers remain fixed and constant for each input.
      In Self Attention, the key, query and value neural networks also remain fixed, so each new input go through the same multiply-add ops to derive the K, Q, V… but these combine to generate a new weight matrix that produces the final output (as shown in video). Hope that clarifies it.