Attention is all you need; Attentional Neural Network Models | Łukasz Kaiser | Masterclass

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 พ.ย. 2024

ความคิดเห็น • 75

  • @autripat
    @autripat 4 ปีที่แล้ว +44

    Starting @ 15:45, in well under 2 minutes, attention explained! Only a true master can do it. Love.

    • @Scranny
      @Scranny 4 ปีที่แล้ว +1

      K is a matrix representing the T previously seen words and V is the matrix representing the full dictionary of words of the target language, right? But what are K and V exactly? What values do these matrices hold? Are they learned?

  • @lmaes
    @lmaes 3 ปีที่แล้ว +11

    The passion that he transmits is priceless

  • @tylersnard
    @tylersnard 3 ปีที่แล้ว +6

    I love how excited he is.

  • @Marcos10PT
    @Marcos10PT 4 ปีที่แล้ว +22

    This is the best explanation of attention I have seen so far! And I have been looking :)

    • @ksrajavel
      @ksrajavel ปีที่แล้ว +1

      Bcoz, he is one of the co-author of the revolutionary paper which introduced it

  • @mosicr
    @mosicr 6 ปีที่แล้ว +42

    Great lecture. Best explanation of attention in just a few words.

  • @itshgirish
    @itshgirish 5 ปีที่แล้ว +7

    Great presentation, he's having fun explaining the bits....great camera work- it was fun watching a moving cam than a boring still view.

  • @CharlesVanNoland
    @CharlesVanNoland ปีที่แล้ว

    I just wish he hadn't stood right in front of what he was trying to show people, but I love his passion for explaining what he's talking about.

  • @Cropinky
    @Cropinky 6 หลายเดือนก่อน

    very interesting of him to call deep learning a trade :)

  • @FranckDernoncourt
    @FranckDernoncourt 4 ปีที่แล้ว +5

    Thanks for sharing! It'd be great if the video could pay more attention to the slides though.

    • @pischool6210
      @pischool6210  4 ปีที่แล้ว +8

      Thank you for your comment, Franck! You can download the slides here: picampus-school.com/open-day-2017-presentations-download/

    • @FranckDernoncourt
      @FranckDernoncourt 4 ปีที่แล้ว

      @@pischool6210 perfect, thanks!

  • @elliotwaite
    @elliotwaite 5 ปีที่แล้ว +2

    Great talk, Łukasz.

  • @yacinebenaffane6535
    @yacinebenaffane6535 5 ปีที่แล้ว +1

    Nice explain about position and multihead ...

  • @igorcherepanov4765
    @igorcherepanov4765 5 ปีที่แล้ว +90

    "there is this guy, he never got his bachelor but he wrote most of these papers" - appreciation

    • @threeMetreJim
      @threeMetreJim 5 ปีที่แล้ว +7

      Where experience and 'thinking outside the box' can beat education in some cases. He should be getting an 'honorary' bachelor degree, if he hasn't already.

    • @MucciciBandz
      @MucciciBandz 5 ปีที่แล้ว +13

      Excuse me? that's fake news! Even his linked in profile says Duke 1998 (yes it's the same noam shazeer from this exact same paper)... "Noam Shazeer is an Engineer at Google. He graduated from Duke in 1998 with a double major in Mathematics and Computer Science"

    • @MrLacker
      @MrLacker 4 ปีที่แล้ว +15

      I think he meant that Noam doesn't have a PhD. Noam does have a bachelors degree, but he started working at Google pretty soon after graduating (literally decades ago) and has contributed to many important Google technologies in his time there. Noam was a Google old-timer back when I started working there in 2005.

  • @kadamparikh8421
    @kadamparikh8421 3 ปีที่แล้ว +2

    Great content in this video. Would love if you had the multi-headed devil covered! Though, great video to get the overall view..

  • @ahmedb2559
    @ahmedb2559 ปีที่แล้ว

    Thank you !

  • @mrvishwjeetkumar
    @mrvishwjeetkumar 6 ปีที่แล้ว +2

    very nice lecture ...enjoyed it lot.

  • @jayantpriyadarshi9266
    @jayantpriyadarshi9266 4 ปีที่แล้ว +1

    Great talk. Something very useful.

  • @vast634
    @vast634 4 ปีที่แล้ว +10

    They should invent a device that can always tell the time of day when the user wants.

  • @rishabhshirke1175
    @rishabhshirke1175 5 ปีที่แล้ว +2

    nothing beats GPT 2 TL;DR summarization trick

  • @nsuryapa1
    @nsuryapa1 5 ปีที่แล้ว +2

    Nice explanation!!!!

  • @brandomiranda6703
    @brandomiranda6703 3 ปีที่แล้ว +2

    where is the library he talks about to get the details of training the DL "right"?

  • @someone_518
    @someone_518 ปีที่แล้ว +2

    ChatGPT gave me link to this video)

  • @threeMetreJim
    @threeMetreJim 5 ปีที่แล้ว +1

    "He didn't put a trophy into the suitcase because it was too small." is an ambiguous statement. "it" could refer to either the trophy or the suitcase. It seems like the answer is mainly decided on probability from past experience, rather than the intended (ambiguous) meaning, similar to a survey or experiment with too small a sample size. It is also possible that he didn't want to put a too small a trophy into the suitcase in case it ended up being jostled about too much, and became damaged; although that is a less likely, but still a possible explanation and would need a thought process to come to that conclusion, or some further context, to clarify the intended meaning. People on the Autistic spectrum (HFA / Asperger's) have that same problem when phrasing thoughts (ambiguous meaning), and are often misunderstood because of it. When a statement has two (or more) possible meanings, then it's probably unfair to judge the performance of a system in 'getting the answer right' as there isn't a definite correct answer to begin with, just a more likely one.
    A word for word translation, with grammatical correction applied would probably achieve a better result in a case like this. Google translate seems to somewhat agree.
    Original: He didn't put a trophy into the suitcase because it was too small
    Google translate: Er hat keine Trophäe in den Koffer gesteckt, weil er zu klein war.
    Back to english: He did not put a trophy in his suitcase because he was too small.
    Word for word translation (incorrect, but probably still understandable if you speak German): er nicht stellen ein Trophäe in das koffer da es was auch klein.
    Google translate of word to word to english (much better but still wrong - where did the 'also' come from?):he does not put a trophy in the suitcase as it is also small.

  • @gilgarad1
    @gilgarad1 6 ปีที่แล้ว +1

    Nice lecture. I enjoyed it

  • @nabinchaudhary73
    @nabinchaudhary73 2 ปีที่แล้ว +1

    does embedding gets trained or key or query or value gets trained i am confused. please help

  • @RobertElliotPahel-Short
    @RobertElliotPahel-Short 4 ปีที่แล้ว +1

    math majors/ graduate math students skip to 15:36

  • @intelligenttrends8935
    @intelligenttrends8935 5 ปีที่แล้ว +1

    Here I get it.
    Thank u

  • @HimanshuGhadigaonkar
    @HimanshuGhadigaonkar 4 ปีที่แล้ว +1

    Best expaination!!

  • @louerleseigneur4532
    @louerleseigneur4532 3 ปีที่แล้ว

    Thanks buddy

  • @TheGodSaw
    @TheGodSaw 7 ปีที่แล้ว +13

    Is there a way to get the slides?

    • @pischool6210
      @pischool6210  7 ปีที่แล้ว +10

      You can download them here: picampus-school.com/open-day-2017-presentations-download/

    • @khanzorbo
      @khanzorbo 6 ปีที่แล้ว +1

      Pi School I have just checked and it seems the slides linked to the presentation is "tensorflow workshop", can you please double-check?

    • @pischool6210
      @pischool6210  6 ปีที่แล้ว +4

      Dear Vladimir, have a look here: drive.google.com/file/d/0B8BcJC1Y8XqobGNBYVpteDdFOWc/view

  • @pankajtiwari12
    @pankajtiwari12 4 ปีที่แล้ว

    great explanation !

  • @KartoffelnSalatMitAlles
    @KartoffelnSalatMitAlles 5 ปีที่แล้ว +1

    What model is that at the beginning? Can I somehow get the machine produced texts which where shown at the beginning of the presentation?
    "

  • @rinkagamine9201
    @rinkagamine9201 6 ปีที่แล้ว +1

    Can I somehow get the machine produced texts which where shown at the beginning of the presentation?

  • @TheAIEpiphany
    @TheAIEpiphany 4 ปีที่แล้ว +2

    47:55 "We tried it on images it didn't work so well". 2020, Visual Transformer: am I a joke to you?

    • @souhamghosh8714
      @souhamghosh8714 4 ปีที่แล้ว

      In VIT, it is clearly stated that a "small dataset" like imagenet doesnt show promising results but a larger dataset like the jft gives amazing results, so this maybe a start, but it is far from perfection. Btw, I am not contradicting your statement. 😁. and also JFT is not an open source dataset(yet)

    • @TheAIEpiphany
      @TheAIEpiphany 4 ปีที่แล้ว

      @@souhamghosh8714 True Google folks ^^

    • @souhamghosh8714
      @souhamghosh8714 4 ปีที่แล้ว

      “Hi, I am from google, you know what i got, TPUs..more than you can imagine”😂

  • @IExSet
    @IExSet ปีที่แล้ว +1

    Strange thing, he mention "attention" term before explaining what it is. What was EXACT meaning of this Query Key Value magic ??? I suspect speakers just copy thoughts of another people mechanically, not understaning real meaning of operations !

  • @homeroni
    @homeroni 5 ปีที่แล้ว

    Are the talks he is referring to (as the previous talks) available on TH-cam?

    • @pischool6210
      @pischool6210  5 ปีที่แล้ว +2

      Hello! Sure. You can find all the Masterclasses from our Open Day here 👉th-cam.com/play/PLU3hjga27ZUiuL8V0CVlidBK27CDxWf-F.html

  • @ramyaneekashyap4356
    @ramyaneekashyap4356 4 ปีที่แล้ว

    Is there any way i could get the ppts for reference?

    • @pischool6210
      @pischool6210  4 ปีที่แล้ว +2

      Hi, sure! You can download it here: picampus-school.com/open-day-2017-presentations-download/

    • @ramyaneekashyap4356
      @ramyaneekashyap4356 4 ปีที่แล้ว

      @@pischool6210 thankyou so much!!!!

  • @sajjadayobi688
    @sajjadayobi688 4 ปีที่แล้ว +1

    Transformers learned translation without language dependency O_o

  • @kingenking9303
    @kingenking9303 3 ปีที่แล้ว

    the video image is too poor, you need to fix it more

  • @josy26
    @josy26 5 ปีที่แล้ว

    Slides?

    • @SubhamKumar-eg1pw
      @SubhamKumar-eg1pw 5 ปีที่แล้ว +2

      drive.google.com/file/d/0B8BcJC1Y8XqobGNBYVpteDdFOWc/view

  • @alexandrogomez5493
    @alexandrogomez5493 ปีที่แล้ว

    Tarea 6

  • @uhmerikuhn
    @uhmerikuhn 3 ปีที่แล้ว +5

    ...comes from Google - Check. ...TensorFlow T-shirt - Check. Most viewers therefore rate this lecture highly - Check.
    This is very hand-wavy throughout with relatively no rigor shown. There are many lectures/presentations online which actually explain the nuts and bolts and wider use cases of Attention mechanisms. Maybe the title of this video should be something else, like "Our group's success with one use case (language translation) of Attention." Frankly, the drive-by treatment of the technical details of language translation case was almost terrible and should have probably been omitted.

    • @georgemaratos1122
      @georgemaratos1122 3 ปีที่แล้ว +1

      which lectures do you like that explain attention mechanisms and their wider use?

  • @ytubeanon
    @ytubeanon 2 หลายเดือนก่อน

    is this guy one of the father's of modern AI? is he a primary reason for chatGPT?

  • @ShadowD2C
    @ShadowD2C 7 หลายเดือนก่อน

    good video but his and the camera placements are subobtimal

  • @clray123
    @clray123 4 ปีที่แล้ว +3

    Most I gather from this talk is that "attention" is a pretty terrible term. Something like "fuzzy lookup" or "matching" or "mapping" would have been much more descriptive, but oh well, which researcher needs to think about terminology before unleashing it on the world.

  • @aojing
    @aojing 6 ปีที่แล้ว +11

    can't believe this guy was one of the authors of Transformer. He just can not explain what he was doing!

    • @mauricet910
      @mauricet910 6 ปีที่แล้ว

      I thought it was a really insightful talk. I'm preparing a talk about Transformer myself, and this talk was super inspiring :)

    • @haiyangsun8344
      @haiyangsun8344 5 ปีที่แล้ว +7

      I also couldn't understand.. The architecture diagram is not very intuitive, and I was expecting some elaborations.. However, the explanation was not clear...

    • @NicholasAmpazis
      @NicholasAmpazis 5 ปีที่แล้ว +3

      If you don’t already know something about attention then it’s impossible to follow the presentation. Everything is explained very poorly...

    • @clray123
      @clray123 4 ปีที่แล้ว +1

      His communication skill are like a runner who is tripping over his shoelaces.
      Unfortunately, it seems to be quite a common ailment of even "brilliant" coders (or shall I say, scientists) that they can't explain their ideas to others clearly using natural language. It's like they have no model of someone else's knowledge and take so many things for granted that their attempts at "explanation" just sound like gobbledygook to those who expect to be taught something. That's why we have technical writers, teachers, popular science books etc.

    • @clray123
      @clray123 ปีที่แล้ว

      @Yancy Stevens Yes, to communicate you have to model in your head whoever you are communicating to, what they know, don't know, and foremost what they want to know. Otherwise it's just a fail, no matter how much knowledge you have.