Transformers - Part 1 - Self-attention: an introduction

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 พ.ย. 2024

ความคิดเห็น • 18

  • @wangkuanlee3548
    @wangkuanlee3548 3 ปีที่แล้ว +8

    Superb explanation. This is the clearest explanation of the concept of weight in self-attention I have ever heard. Thank you so much.

  • @exxzxxe
    @exxzxxe 2 ปีที่แล้ว +1

    A first class explanation of self attention- the best on TH-cam.

  • @piyushkumar-wg8cv
    @piyushkumar-wg8cv ปีที่แล้ว

    Intuition buildup was amazing, you clearly explained why we need learnable parameters in the first place and how that can help relate similar words. Thanks for the explanation.

  • @andrem82
    @andrem82 2 ปีที่แล้ว

    Best explanation of self-attention I've seen so far. This is gold.

  • @kencheligeer3448
    @kencheligeer3448 3 ปีที่แล้ว +3

    It is a very brilliant explanation about self-attention!!! Thank you.

  • @mar-a-lagofbibug8833
    @mar-a-lagofbibug8833 3 ปีที่แล้ว +3

    Thank you for sharing.

  • @jhnflory
    @jhnflory 2 ปีที่แล้ว

    Thanks for putting these videos together!

  • @mustafakocakulak5895
    @mustafakocakulak5895 3 ปีที่แล้ว +1

    Best explanation ever :) Thank you

  • @prasadkendre149
    @prasadkendre149 2 ปีที่แล้ว

    greatful forever

  • @euisasriani_01
    @euisasriani_01 3 ปีที่แล้ว

    Thank you for great explanation. I still don't understand about how to gain Wq ad Wk.

  • @kacemichakdi3048
    @kacemichakdi3048 3 ปีที่แล้ว +1

    Thank you for your explanation. I just didnt understand how we chose W_k and W_q???

    • @lennartsvensson7636
      @lennartsvensson7636  3 ปีที่แล้ว +1

      These matrices contain learnable parameters that can be trained using standard techniques from deep learning.

    • @kacemichakdi3048
      @kacemichakdi3048 3 ปีที่แล้ว

      Thank you

  • @murkyPurple123
    @murkyPurple123 3 ปีที่แล้ว +1

    Thank you

  • @ahmedb2559
    @ahmedb2559 ปีที่แล้ว

    Thank you !

  • @po-yupaulchen166
    @po-yupaulchen166 3 ปีที่แล้ว

    Great and clear explanation. One question about W_Q and W_K. Since z1 = k1^T *q3 = x1^T * (W_k^T * W_Q) * x2, and W_k and W_Q are trainable matrices, could we just combine it as a matrix like
    W_KQ = W_k^T * W_Q to reduce the number of paramters?

    • @lennartsvensson7636
      @lennartsvensson7636  3 ปีที่แล้ว

      What you are suggesting should be possible as long as the matrices are quadratic.

  • @prateekpatel6082
    @prateekpatel6082 10 หลายเดือนก่อน

    pretty bad example . Even if we have trainiable Wq and Wk , what if there was a new sentence where we had Tom and and he , the WQ will still make word 9 point to wmma and she