Self-Attention

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ธ.ค. 2024

ความคิดเห็น •

  • @saitrinathdubba
    @saitrinathdubba 5 หลายเดือนก่อน

    This is brilliant !! the way you have combined encoder-decoder attention computation to this self-attention is really cool, honestly I have not come across anything like this in any of the blogs/writeups. I have a doubt prof, traditionally to compute the e_{tj} , on top of the linear transformation, we have used the tanh for non-linearity right. Here in the case of self-attention, though we are doing linear transformation, but we aren't applying non-linearity , can you please suggest why is that ? Thank you once again !!

    • @shubhamgattani5357
      @shubhamgattani5357 2 หลายเดือนก่อน

      Softmax is the only non-linear thing in the whole set-up