ไม่สามารถเล่นวิดีโอนี้
ขออภัยในความไม่สะดวก

The Math behind Transformers | Srijit Mukherjee | Computer Vision | Natural Language Processing

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ก.พ. 2023
  • In this video, I explain the mathematics in the optimum possible way behind the Transformers, which is the widely used architecture behind the natural language processing tasks. Recently, transformers have been used to solve computer vision problems too. The explanation is based on the paper "Attention is all you need" [arxiv.org/abs/...] by Ashish Vaswani et al.
    There are some errors in the video which are mentioned in the comments as well as on the following page. Many other learning resources are being shared on the page below.
    May you find join in learning about the science of data.

ความคิดเห็น • 26

  • @arshadkazi4559
    @arshadkazi4559 ปีที่แล้ว +3

    Great Video! Liked the simplicity and honesty in the explanation.

  • @BiprojitNath
    @BiprojitNath ปีที่แล้ว +2

    This is amazing. Keep making more such tutorial.

  • @sayantan336
    @sayantan336 ปีที่แล้ว +4

    Just a point the embeddings (512 dim vectorized representations) of the input/output words are also learned as a by-product of the training process.... i.e. The embedding layers both on the encoder side & the decoder side - are part of the learnable parameters of the model

  • @annyd3406
    @annyd3406 ปีที่แล้ว +4

    Please make more videos you are great thank you for this!!!
    Make INDIA proud....

    • @mukherjeesrijit
      @mukherjeesrijit  ปีที่แล้ว

      Thank you! I will try my best.

    • @michaelestrinone2111
      @michaelestrinone2111 ปีที่แล้ว

      India Or America? His is a PhD student in American(!) Uni. If he wanted to make India proud he would stay in India.

  • @sayantan336
    @sayantan336 ปีที่แล้ว +4

    Well explained ... Thanks Srijit ... Enjoyed quite a lot..

  • @Rahulsircar94
    @Rahulsircar94 4 หลายเดือนก่อน +1

    lol transformers for bengalis.loved it.

  • @harishravi9936
    @harishravi9936 ปีที่แล้ว +1

    At 55:44,will the mask matrix be a lower triangular with all -inf ?

  • @harishnayak976
    @harishnayak976 8 หลายเดือนก่อน

    What is the equation for whole process in one mathematical form?

  • @hari8568
    @hari8568 ปีที่แล้ว

    At 31:51 if we are finding similarities between two words, shouldn't QK^T matrix be symmetric?because relation score between any two words should be the same

  • @hari8568
    @hari8568 ปีที่แล้ว

    At 41:51 how exactly do we get Zi?Are we using PCA?

  • @sbera07
    @sbera07 ปีที่แล้ว

    can you provide the notes u have made in this video

  • @user-zn8rz3hu2k
    @user-zn8rz3hu2k ปีที่แล้ว

    I would like to recommend a fantastic paper by DeepMine that provides comprehensive and detailed explanations regarding this topic.

  • @subhadipsarkar7692
    @subhadipsarkar7692 11 หลายเดือนก่อน

  • @gasun1274
    @gasun1274 ปีที่แล้ว +1

    you have a tendency to overly raise your tone while you speak, i do this subconsciously too but you need to know that it is very annoying.

    • @mukherjeesrijit
      @mukherjeesrijit  ปีที่แล้ว

      Thank you for your suggestion. I will keep it in my mind.

  • @vishruttalekar8626
    @vishruttalekar8626 ปีที่แล้ว +2

    Disappointing!!

    • @mukherjeesrijit
      @mukherjeesrijit  ปีที่แล้ว

      Let me know how it can be improved.

    • @emrahe468
      @emrahe468 2 หลายเดือนก่อน

      ​@@mukherjeesrijit I'm a confused by the diagram in your video at @6:00. The left side seems to show inputs in a foreign language being fed into an encoder, while the right side displays multiple sequences in English. Is this setup discussing a specific type of decoder model like GPT-2, or is it more about an encoder-decoder architecture used for translation? The background of the diagram makes it hard to determine the exact context. With that diagram things get messy for me, hence couldn't stand much, sorry