Transformer Model (2/2): Build a Deep Neural Network (1.25x speed recommended)

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 พ.ย. 2024

ความคิดเห็น • 6

  • @temporarychannel4339
    @temporarychannel4339 2 ปีที่แล้ว

    I highly appreciate a refined tutorial like this. A lot of stuff in books and blogs is pure garbage. Watch the Attention for RNN Seq2Seq Models videos to understand this one better.

  • @sahhaf1234
    @sahhaf1234 2 ปีที่แล้ว +7

    One thing is apparently forgotten to be mentioned:
    --in the attention layers, if the output of a head is d dimensional, and we have l heads, the context vectors will be ld dimensional..
    --dense layers reduce it to d dimensions again. Hence the dense layers must have ld inputs and d outputs..
    Otherwise, @5:51 doesnt make sense.

  • @phuctranchi7898
    @phuctranchi7898 3 ปีที่แล้ว +3

    Very easy to understand this hard topic. Thanks alot.

  • @JoshuaOwoyemi
    @JoshuaOwoyemi 3 ปีที่แล้ว +2

    Thanks for the video. Really detailed and informative. I'm still not sure how the two input sequences are combined to give the output sequence in the decoder. Can you recommend a material to consult for this?

  • @shashwathpunneshetty1260
    @shashwathpunneshetty1260 ปีที่แล้ว

    Great explanation!!

  • @rongwang6142
    @rongwang6142 ปีที่แล้ว

    ❤❤great