Transformers for beginners | What are they and how do they work

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 มิ.ย. 2023
  • Over the past five years, Transformers, a neural network architecture, have completely transformed state-of-the-art natural language processing.
    *************************************************************************
    For queries: You can comment in comment section or you can mail me at aarohisingla1987@gmail.com
    *************************************************************************
    The encoder takes the input sentence and converts it into a series of numbers called vectors, which represent the meaning of the words. These vectors are then passed to the decoder, which generates the translated sentence.
    Now, the magic of the transformer network lies in how it handles attention. Instead of looking at each word one by one, it considers the entire sentence at once. It calculates a similarity score between each word in the input sentence and every other word, giving higher scores to the words that are more important for translation.
    To do this, the transformer network uses a mechanism called self-attention. Self-attention allows the model to weigh the importance of each word in the sentence based on its relevance to other words. By doing this, the model can focus more on the important parts of the sentence and less on the irrelevant ones.
    In addition to self-attention, transformer networks also use something called positional encoding. Since the model treats words as individual entities, it doesn't have any inherent understanding of word order. Positional encoding helps the model to understand the sequence of words in a sentence by adding information about their position.
    Once the encoder has calculated the attention scores and combined them with positional encoding, the resulting vectors are passed to the decoder. The decoder uses a similar attention mechanism to generate the translated sentence, one word at a time.
    Transformers are the model behind GPT, BERT, and T5
    #transformers #naturallanguageprocessing #nlp
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 104

  • @lyeln
    @lyeln 4 หลายเดือนก่อน +8

    This is the only video around that REALLY EXPLAINS the transformer! I immensely appreciate your step by step approach and the use of the example. Thank you so much 🙏🙏🙏

  • @user-kx1nm3vw5s
    @user-kx1nm3vw5s 23 วันที่ผ่านมา

    Its great. I have only one query as whats the input of the masked multi-head attention as its not clear to me kindly guide me about it?

  • @user-mv5bo4vf2v
    @user-mv5bo4vf2v 5 หลายเดือนก่อน

    Hello and Thank you so much. 1 question: I don't realize where the numbers in word embedding and positional encoding come from?

  • @mdfarhadhussain
    @mdfarhadhussain 5 หลายเดือนก่อน +2

    Very nice high level description of Transformer

  • @animexworld6614
    @animexworld6614 2 วันที่ผ่านมา

    Great Content

  • @BharatK-mm2uy
    @BharatK-mm2uy 2 หลายเดือนก่อน

    Great Explanation, Thanks

    • @CodeWithAarohi
      @CodeWithAarohi  2 หลายเดือนก่อน

      Glad it was helpful!

  • @MrPioneer7
    @MrPioneer7 22 วันที่ผ่านมา

    I had watched 3 or 4 videos about transformers before this tutorial. Finally, this tutorial made me understand the concept of transformers. Thanks for your complete and clear explanations and your illustrative example. Specially, your description about query, key and value was really helpful.

  • @exoticcoder5365
    @exoticcoder5365 11 หลายเดือนก่อน

    Very well explained ! I can instantly grab the concept ! Thank you Miss !

    • @CodeWithAarohi
      @CodeWithAarohi  11 หลายเดือนก่อน

      Glad it was helpful!

  • @PallaviPadav
    @PallaviPadav หลายเดือนก่อน +1

    Accidentally I came across this video, very well explained. You are doing an excellent job .

  • @VishalSingh-wt9yj
    @VishalSingh-wt9yj 5 หลายเดือนก่อน

    Well explained. before watching this video i was very confused in understanding how transformers works but your video helped me alot

    • @CodeWithAarohi
      @CodeWithAarohi  5 หลายเดือนก่อน

      Glad my video is helpful!

  • @aditichawla3253
    @aditichawla3253 5 หลายเดือนก่อน

    Great explanation! Keep uploading such nice informative content.

  • @MAHI-kj5tg
    @MAHI-kj5tg 7 หลายเดือนก่อน

    Just amazing explanation 👌

  • @user-oo2co6xb8u
    @user-oo2co6xb8u 8 วันที่ผ่านมา

    Wow.. you are amazing. Thank you for the clear explanation

  • @soravsingla6574
    @soravsingla6574 7 หลายเดือนก่อน

    Very well explained

  • @debarpitosinha1162
    @debarpitosinha1162 2 หลายเดือนก่อน

    Great Explanation mam

  • @bijayalaxmikar6982
    @bijayalaxmikar6982 4 หลายเดือนก่อน

    excellent explanation

  • @servatechtips
    @servatechtips 10 หลายเดือนก่อน

    This is a fantastic, Very Good explanation.
    Thank you so much for good explanation

    • @CodeWithAarohi
      @CodeWithAarohi  10 หลายเดือนก่อน +1

      Glad it was helpful!

  • @satishbabu5510
    @satishbabu5510 28 วันที่ผ่านมา

    thank you very much for explaining and breaking it down 😀 comparatively so far, your explanation is easy to understand compared to other channels thank you very much for making this video and sharing to everyone❤

  • @harshilldaggupati
    @harshilldaggupati 10 หลายเดือนก่อน +1

    Very well explained, even with such niche viewer base, keep making more of these please

    • @CodeWithAarohi
      @CodeWithAarohi  10 หลายเดือนก่อน +1

      Thank you, I will

  • @pandusivaprasad4277
    @pandusivaprasad4277 4 หลายเดือนก่อน

    excellent explanation madam... thank you so much

  • @user-dl4jq2yn1c
    @user-dl4jq2yn1c 26 วันที่ผ่านมา

    Best video ever explaining the concepts in really lucid way maam,thanks a lot,pls keep posting,i subscribed 😊🎉

  • @imranzahoor387
    @imranzahoor387 3 หลายเดือนก่อน

    best explanation i saw multiple video but this provide the clear concept keep it up

  • @TheMayankDixit
    @TheMayankDixit 8 หลายเดือนก่อน

    Nice explanation Ma'am.

  • @user-wh8vy9ol8w
    @user-wh8vy9ol8w 29 วันที่ผ่านมา

    Can you please let us know I/p for mask multi head attention. You just said decoder. Can you please explain. Thanks

  • @burerabiya7866
    @burerabiya7866 3 หลายเดือนก่อน

    can you please upload the presentation

  • @vimalshrivastava6586
    @vimalshrivastava6586 11 หลายเดือนก่อน

    Thanks for making such an informative video. Please could you make a video on the transformer for image classification or image segmentation applications.

    • @CodeWithAarohi
      @CodeWithAarohi  11 หลายเดือนก่อน

      Will cover that soon

  • @manishnayak9759
    @manishnayak9759 7 หลายเดือนก่อน

    Thanks Aaroh i😇

  • @AbdulHaseeb091
    @AbdulHaseeb091 2 หลายเดือนก่อน

    Ma'am, we are eagerly hoping for a comprehensive Machine Learning and Computer Vision playlist. Your teaching style is unmatched, and I truly wish your channel reaches 100 million subscribers! 🌟

    • @CodeWithAarohi
      @CodeWithAarohi  2 หลายเดือนก่อน +1

      Thank you so much for your incredibly kind words and support!🙂 Creating a comprehensive Machine Learning and Computer Vision playlist is an excellent idea, and I'll definitely consider it for future content.

  • @soravsingla6574
    @soravsingla6574 7 หลายเดือนก่อน +1

    Hello Ma’am
    Your AI and Data Science content is consistently impressive! Thanks for making complex concepts so accessible. Keep up the great work! 🚀 #ArtificialIntelligence #DataScience #ImpressiveContent 👏👍

  • @thangarajerode7971
    @thangarajerode7971 10 หลายเดือนก่อน

    Thanks. Concept explained very well. Could you please add one custom example (e.g finding similarity questions)using Transformers?

  • @afn8370
    @afn8370 9 วันที่ผ่านมา

    your video is good, explanation is excellent , only negative I felt was the bg noise. pls use a better mic with noise cancellation. thankyou once again for this video

    • @CodeWithAarohi
      @CodeWithAarohi  9 วันที่ผ่านมา

      Noted! I will take care of the noise :)

  • @_Who_u_are
    @_Who_u_are 16 วันที่ผ่านมา

    Thank you so much

  • @minalmahala5260
    @minalmahala5260 หลายเดือนก่อน

    Really very nice explanation ma'am!

    • @CodeWithAarohi
      @CodeWithAarohi  หลายเดือนก่อน

      Glad my video is helpful!

  • @vasoyarutvik2897
    @vasoyarutvik2897 7 หลายเดือนก่อน

    Very Good Video Ma'am, Love from Gujarat, Keep it up

  • @sahaj2805
    @sahaj2805 2 หลายเดือนก่อน

    The best explanation of transformer that I have got on the internet , can you please make a detailed long video on transformers with theory , mathematics and more examples. I am not clear about linear and softmax layer and what is done after that , how training happens and how transformers work on the test data , can you please make a detailed video on this?

    • @CodeWithAarohi
      @CodeWithAarohi  2 หลายเดือนก่อน +1

      I will try to make it after finishing the pipelined work.

    • @sahaj2805
      @sahaj2805 2 หลายเดือนก่อน

      @@CodeWithAarohi Thanks will wait for the detailed transformer video :)

  • @_seeker423
    @_seeker423 4 หลายเดือนก่อน

    Question about query, key, value dimensionality
    Given that
    query is a word that is looking for other words to pay attention to
    key is a word that is being looked at by other words
    shouldn't query and word be a vector of size the same as number of input tokens? so that when there is a dot product between query and key the word that is querying can be correctly (positionally) dot product'd with key and get the self attention value for the word?

    • @CodeWithAarohi
      @CodeWithAarohi  4 หลายเดือนก่อน +1

      The dimensionality of query, key, and value vectors in transformers is a hyperparameter, not directly tied to the number of input tokens. The dot product operation between query and key vectors allows the model to capture relationships and dependencies between tokens, while positional information is often handled separately through positional embeddings.

  • @akshayanair6074
    @akshayanair6074 10 หลายเดือนก่อน

    Thank you. The concept has been explained very well. Could you please also explain how these query, key and value vectors are calculated?

    • @CodeWithAarohi
      @CodeWithAarohi  10 หลายเดือนก่อน

      Sure, Will cover that in a separate video.

  • @_seeker423
    @_seeker423 4 หลายเดือนก่อน

    Can you also talk about the purpose of the 'feed forward' layer. looks like its only there to add non-linearity. is that right?

    • @abirahmedsohan3554
      @abirahmedsohan3554 2 หลายเดือนก่อน

      Yes you can say that..but mayb also for make key, quarry and value trainable

  • @niluthonte45
    @niluthonte45 7 หลายเดือนก่อน

    thank you mam

  • @sahaj2805
    @sahaj2805 2 หลายเดือนก่อน

    Can you please make a detailed video explaining the Attention is all you need research paper line by line, thanks in advance :)

  • @nikhilrao20
    @nikhilrao20 6 หลายเดือนก่อน

    Didn't understand what is the input to the masked multi head self attention layer in the decoder, Can you please explain me?

    • @CodeWithAarohi
      @CodeWithAarohi  6 หลายเดือนก่อน +1

      In the Transformer decoder, the masked multi-head self-attention layer takes three inputs: Queries(Q), Keys(K) and Values(V)
      Queries (Q): These are vectors representing the current positions in the sequence. They are used to determine how much attention each position should give to other positions.
      Keys (K): These are vectors representing all positions in the sequence. They are used to calculate the attention scores between the current position (represented by the query) and all other positions.
      Values (V): These are vectors containing information from all positions in the sequence. The values are combined based on the attention scores to produce the output for the current position.
      The masking in the self-attention mechanism ensures that during training, a position cannot attend to future positions, preventing information leakage from the future.
      In short, the masked multi-head self-attention layer helps the decoder focus on relevant parts of the input sequence while generating the output sequence, and the masking ensures it doesn't cheat by looking at future information during training.

  • @mahmudulhassan6857
    @mahmudulhassan6857 9 หลายเดือนก่อน

    maam can you please make one video of classification using multi-head attention with custom dataset

  • @akramsyed3628
    @akramsyed3628 6 หลายเดือนก่อน

    can you please explain 22:07 onward

  • @sairampenjarla
    @sairampenjarla 4 วันที่ผ่านมา

    hi, Good explanation but at the end, when you explained what would be the input to the decoder's masked multi-head attention, you fumbled and didn't explain clearly. But the rest of the video was very good.

    • @CodeWithAarohi
      @CodeWithAarohi  3 วันที่ผ่านมา

      Thank you for the feedback!

  • @palurikrishnaveni8344
    @palurikrishnaveni8344 11 หลายเดือนก่อน

    Could you make a video on image classification for vision transformer, madam ?

  • @sukritgarg3175
    @sukritgarg3175 3 หลายเดือนก่อน

    Great Video ma'am could you please clarify what you said at 22:20 once again... I think there was a bit confusion there.

  • @techgirl6451
    @techgirl6451 7 หลายเดือนก่อน

    hello maa is this transform concept same for transformers in NLP?

    • @CodeWithAarohi
      @CodeWithAarohi  7 หลายเดือนก่อน

      The concept of "transform" in computer vision and "transformers" in natural language processing (NLP) are related but not quite the same.

  • @KavyaDabuli-ei1dr
    @KavyaDabuli-ei1dr 3 หลายเดือนก่อน

    Can you please make a video on bert?

  • @user-gf7kx8yk9v
    @user-gf7kx8yk9v 8 หลายเดือนก่อน

    how to get pdfs mam

  • @kadapallavineshnithinkumar2473
    @kadapallavineshnithinkumar2473 11 หลายเดือนก่อน

    Could you explain with python code which would be more practical. Thanks for sharing your knowledge

    • @CodeWithAarohi
      @CodeWithAarohi  11 หลายเดือนก่อน

      Sure, will cover that soon.

  • @saeed577
    @saeed577 3 หลายเดือนก่อน

    I thought it's transformers in CV. all explanations were in NLP

    • @CodeWithAarohi
      @CodeWithAarohi  3 หลายเดือนก่อน

      I recommend you to understand this video first and then check this video: th-cam.com/video/tkZMj1VKD9s/w-d-xo.html After watching these 2 videos, you will understand properly the concept of transformers used in computer vision. Transformers in CV are based on the idea of transformers in NLP. SO its better for understanding if you learn the way I told you.

  • @Red_Black_splay
    @Red_Black_splay หลายเดือนก่อน

    Gonna tell my kids this was optimus prime.

    • @CodeWithAarohi
      @CodeWithAarohi  หลายเดือนก่อน

      Haha, I love it! Optimus Prime has some serious competition now :)

  • @jagatdada2.021
    @jagatdada2.021 7 หลายเดือนก่อน

    Use mic, background noise irritate

    • @CodeWithAarohi
      @CodeWithAarohi  7 หลายเดือนก่อน +1

      Noted! Thanks for the feedback.

  • @_Who_u_are
    @_Who_u_are 16 วันที่ผ่านมา

    Speaking in Hindi would be more better

    • @CodeWithAarohi
      @CodeWithAarohi  12 วันที่ผ่านมา

      Sorry for inconvenience