CS4780 Transformers (additional lecture 2023)

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ธ.ค. 2024

ความคิดเห็น •

  • @itachi4alltime
    @itachi4alltime ปีที่แล้ว +10

    He's back

  • @saisitaramanapradyumnamall5408
    @saisitaramanapradyumnamall5408 11 หลายเดือนก่อน

    Thank you for this amazing series of lectures Professor!

  • @hrithiksingh6931
    @hrithiksingh6931 ปีที่แล้ว

    Thankyou so much professor , learnt a lot from your videos. You are one of the best teachers I have seen in my life. Once again thankyou so much.

  • @AakarshNair
    @AakarshNair 5 หลายเดือนก่อน

    wow, you are a good teacher!

  • @trantandat2699
    @trantandat2699 ปีที่แล้ว +1

    The best transformer ever !

  • @coolarun3150
    @coolarun3150 ปีที่แล้ว +1

    crisp and clear!!!

  • @jonathnalee1790
    @jonathnalee1790 ปีที่แล้ว

    Missed this one cuz of a final; thanks for uploading this!

  • @mimunzar
    @mimunzar ปีที่แล้ว

    Thank you a loooot! :)

  • @lotfullahandishmand4973
    @lotfullahandishmand4973 ปีที่แล้ว

    Was waiting for this, after that machine learning course.

  • @fierydino9402
    @fierydino9402 ปีที่แล้ว +4

    Professor thank you a lot! Do you have a plan to upload the lecture of diffusion model too?

  • @vivi412a8nl
    @vivi412a8nl 10 หลายเดือนก่อน

    I have a question regarding masked multi-head attention around 53:30, if the outputs are generated one by one, then how can the word 'bananas' knows about the word 'cherries' (because at the time bananas is generated, cherries is not yet generated) and be modified by it? ie., why do we have to worry about cherries modifying bananas (aka having information about the future) if cherries hasn't even existed at that point?

    • @kilianweinberger698
      @kilianweinberger698  10 หลายเดือนก่อน

      In some sense it is really all a speed-up. The moment cherry comes along, it could modify bananas. However, you don't want this, because you want to avoid re-computing all the representations of all the words you have already generated. If you do the masked attention, then you are safe, and you can re-use the representation you computed for bananas when cherry didn't even exist. Does this make sense?

    • @vivi412a8nl
      @vivi412a8nl 10 หลายเดือนก่อน

      @@kilianweinberger698 Thank you Professor that makes a lot of sense, I never thought about the idea of avoiding recalculation. Thank you again for making these great materials available for free.

    • @anas.2k866
      @anas.2k866 5 หลายเดือนก่อน

      @@vivi412a8nl I think the masked self-attention is there for not allowing the model to cheat. More clearly, during training, let say you have a sentence of 10 tokens, the model will output 10 vectors each one of them will contribute to compute the loss for that sentence. the loss of this sentence is the sum of the losses of each output of the model. If the output of token number 5 is an average of all input then it is easy for the model to predict token 6. You don't want this because in inference you will not have access to token 6. You want your model to model human language not to cheat by looking to future training tokens that you will not have during real time inference.
      May be professor Weinberger can confirm this?

  • @suvkaka
    @suvkaka 10 หลายเดือนก่อน

    @kilianweinberger698 Sir, How do we ensure that adding pos encoding does not distort the original embedding too much? or how is that the sums of embedding and positional encoding of different tokens do not collide?

    • @kilianweinberger698
      @kilianweinberger698  10 หลายเดือนก่อน +1

      It can change the encoding a little, and lately people have started developing alternatives. However, in general it isn’t really a big problem, because the positional embedding is always exactly the same for every training sequence, so the network can easily learn to remove it.

    • @suvkaka
      @suvkaka 10 หลายเดือนก่อน

      @@kilianweinberger698 Thank you professor

  • @peterengel8601
    @peterengel8601 ปีที่แล้ว +2

    Hi professor

  • @mykun8737
    @mykun8737 ปีที่แล้ว

    Dear Kilian, the lectures in university classrooms can be quite challenging to follow. Could you please create a specialized course on machine learning and deep learning in the form of short video lectures with accompanying presentation slides? If possible, I would love to see these courses published on platforms like Udemy, for example.

    • @kilianweinberger698
      @kilianweinberger698  ปีที่แล้ว +5

      I did! ecornell.cornell.edu/certificates/technology/machine-learning/
      Note that Cornell does charge tuition for it, but you will also earn an official certificate.

    • @mykun8737
      @mykun8737 ปีที่แล้ว

      I've visited the website you recommended, but I'm struggling to figure out how to learn there. If possible, could you consider moving the videos you've created to a platform like Udemy? Udemy has a large community of computer science students, and you might attract more students there@@kilianweinberger698

  • @goldencircle4331
    @goldencircle4331 ปีที่แล้ว

    Hi,
    Is there going to be a recording for Vision Transformers?