Transformers (how LLMs work) explained visually | DL5

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 ธ.ค. 2024

ความคิดเห็น • 2.7K

  • @3blue1brown
    @3blue1brown  8 หลายเดือนก่อน +1014

    Edit: The finalized version of the next chapter is out th-cam.com/video/eMlx5fFNoYc/w-d-xo.html
    Early feedback on video drafts is always very important to me. Channel supporters always get a view of new videos before their release to help inform final revisions. Join at 3b1b.co/support if you’d like to be part of that early viewing group.

    • @bbrother92
      @bbrother92 8 หลายเดือนก่อน +11

      @3Blue1Brown thanks explaining these things - it is very hard for web programmer to undestand math

    • @JohnSegrave
      @JohnSegrave 8 หลายเดือนก่อน +17

      Grant, this is so good! I've worked in ML about 8 years and this is one of the best descriptions I've seen. Very nicely done. Big 👍👍👍

    • @deker0954
      @deker0954 8 หลายเดือนก่อน +1

      Is this worth understanding?

    • @bbrother92
      @bbrother92 8 หลายเดือนก่อน +2

      @@JohnSegrave sir, could you recomend video analysis framework any video description model?

    • @didemyldz1317
      @didemyldz1317 8 หลายเดือนก่อน +2

      Could you share the name of the model that is used for text-to-speech generation ? Me and my teammate are working on a Song Translator as a senior design project. This might be very helpful. Thanks in advance :)

  • @iau
    @iau 8 หลายเดือนก่อน +3468

    I graduated from Computer Science in 2017. Back then, the cutting edge of ML were Recurrent Neural Networks, in which I based my thesis. This video (and I'm sure the rest of this series) just allowed me to catch up to years of advancements in so little time.
    I cannot describe how important your teaching style is to the world. I've been reading articles, blogs, papers on embeddings and these topics for years now and I never got it quite like I got it today. In less than 30 minutes.
    Imagine a world in which every teacher taught like you. We would save millions and millions of man hours every hour.
    You truly have something special with this channel and I can only wish more people started imitating you with the same level of quality and care. If only this became the standard. You'd deserve a Noble Prize for propelling the next thoustand Nobel Prizes.

    • @lucascorreaaa
      @lucascorreaaa 8 หลายเดือนก่อน +47

      Second that!

    • @kyo250996
      @kyo250996 8 หลายเดือนก่อน +53

      Same, I did a thesis about vectorize word back in 2017 and no one ever talked about the whole vector of word gives rise to meaning and context when you generate phrases.
      Too bad since noone was interested in ML back then, I leaned on web development and drop the ML :(

    • @iankrasnow5383
      @iankrasnow5383 8 หลายเดือนก่อน +24

      Funny enough, the other 6 videos in this series all came out in 2017, so you probably didn't miss much.

    • @XMysticHerox
      @XMysticHerox 8 หลายเดือนก่อน +24

      Well transformers were first developed in 2017 so it was the cutting edge exactly when you graduated ^^

    • @rock_sheep4241
      @rock_sheep4241 8 หลายเดือนก่อน +2

      This is explained in layman terms, but in reality is more complicated than this

  • @DynestiGTI
    @DynestiGTI 8 หลายเดือนก่อน +3098

    Grant casually uploading the best video on Transformers on TH-cam

    • @drgetwrekt869
      @drgetwrekt869 8 หลายเดือนก่อน +13

      i was expecting froggin electromagnets to be honest :-)

    • @brandonmoore644
      @brandonmoore644 8 หลายเดือนก่อน +19

      This video was insanely good!

    • @shoam2103
      @shoam2103 8 หลายเดือนก่อน +8

      Even having a basic understanding of what it is, this was still extremely helpful!

    • @yigitpolat
      @yigitpolat 8 หลายเดือนก่อน +4

      yeah but it did not talk about transformers in this chapter

    • @stefchristensen47
      @stefchristensen47 8 หลายเดือนก่อน

      I wish I could retweet this post.

  • @billbill1235
    @billbill1235 8 หลายเดือนก่อน +1852

    I was trying to understand chatGPT through videos and texts on the Internet. I always said: I wish 3b1b releases a video about it, it's the only way for someone inexperienced to understand, and here it is. Thank you very much for your contributions to youtube!!

    • @lmao8207
      @lmao8207 8 หลายเดือนก่อน +24

      no even the other videos are kinda meh, even if youre not inexperienced because they dont go in depth, i feel here people get a nice understanding of the concepts captured by the models instead of just the architecture of the models

    • @goldeer7129
      @goldeer7129 8 หลายเดือนก่อน

      It's kind of true, but if I had to recommend a good place to actually understand transformers and even other machine learning things I would definitely recommend StatQuest, its levels of clearly explaining what's going on are very high. But I'm also very excited to see how 3B1B is going to render all that visually as always

    • @himalayo
      @himalayo 8 หลายเดือนก่อน +3

      I was also just looking into transformers due to their extreme takeover in computer vision!

    • @baconheadhair6938
      @baconheadhair6938 8 หลายเดือนก่อน +4

      shoulda just asked chatgpt

    • @ironmancloud9759
      @ironmancloud9759 8 หลายเดือนก่อน +1

      NLP specialization by Andrew covered everything 😅

  • @metapoynter
    @metapoynter 3 หลายเดือนก่อน +13

    The best lecture I have ever seen on the intro to Transformers. These videos complement the book "Build a Large Language Model (From Scratch) - Sebastian Raschka" really well.

  • @haorancheng4870
    @haorancheng4870 8 หลายเดือนก่อน +38

    I listened to my professor explaining the crazy equation of softmax for a semester already, and you explained it so well with how temperature also plays a role there. Big RESPECT!

  • @Silent_Knife
    @Silent_Knife 8 หลายเดือนก่อน +1247

    The return of the legend! This series is continuing, that is the best surprise of TH-cam, thanks Grant, you have no idea how much the young population of academia is indebted to you.

    • @kikiroy5178
      @kikiroy5178 8 หลายเดือนก่อน +7

      I'm 26, young engineer. Thinking the same. Well said.

    • @youonlytubeonce
      @youonlytubeonce 8 หลายเดือนก่อน +6

      I liked your comment because I'm sure you're right but don't be ageist! 😊 Us olds love him too!

    • @samad.chouihat4222
      @samad.chouihat4222 8 หลายเดือนก่อน

      Young and seniors alike

    • @robertwilsoniii2048
      @robertwilsoniii2048 6 หลายเดือนก่อน

      And by logic, Grant is indebted to his 2015 era Stanford education. That was a high point in the faculty and curriculum in general there.

  • @tempo511
    @tempo511 8 หลายเดือนก่อน +834

    The fact that meaning behind tokens is embedded into this 12000 dimensional space, and you get relationships in terms of coordinates and direction, that exists across topics is mind blowing. Like, Japan -> sushi is similar to Germany -> bratwurst is just so darn neat

    • @amarissimus29
      @amarissimus29 8 หลายเดือนก่อน

      And it makes the absurdly ham fisted model tampering behind debacles like the Gemini launch look even more absurd. I can hear the troglodytes mobbing in the nth dimension.

    • @dayelu2679
      @dayelu2679 8 หลายเดือนก่อน +13

      I‘ve come to this realization long time ago then I want to find isomorphic structures of concepts across different disciplines

    • @TheKoekiemonster1234
      @TheKoekiemonster1234 8 หลายเดือนก่อน

      @@dayelu2679🤓

    • @stefchristensen47
      @stefchristensen47 8 หลายเดือนก่อน +36

      You can actually try this out in your nearest large language model, like ChatGPT, CoPilot, Gemini, or Mistral. Just ask it to do vector math on the words. Since there isn't a predefined vector word calculus in English, the LLM defaults to just using a version of its own internal representation, and so it can eke out pretty results. I was able to duplicate Hitler - Germany + Italy = Mussolini and sushi - Japan + Germany = sausage (or bratwurst, bother score highly) in GPT-3.5-Turbo Complete.
      It also figured out sushi - Japan + Lebanon = shawarma; sushi - Japan + Korea = kimchi; Hitler - Germany + Spain = Franco; and Hitler - Germany + Russia = Stalin.

    • @Flako-dd
      @Flako-dd 8 หลายเดือนก่อน +15

      Super disappointed. The German Sushi is called Rollmops.

  • @nicholaitukanov1162
    @nicholaitukanov1162 8 หลายเดือนก่อน +440

    I have been working on transformers for the past few years and this is the greatest visualization of the underlying computation that I have seen. Your videos never disappoint!!

    • @brian8507
      @brian8507 8 หลายเดือนก่อน +3

      So if we "stop" you... then we avoid judgement day? We should meet for coffee

    • @giacomobarattini1130
      @giacomobarattini1130 8 หลายเดือนก่อน +18

      ​@@brian8507 "judgement day" 😭

    • @beProsto
      @beProsto 8 หลายเดือนก่อน

      ​@@brian8507 bro's got underlying psychological issues

    • @talharuzgarakkus7768
      @talharuzgarakkus7768 8 หลายเดือนก่อน

      I agree with you . Visualization is perfect way to understanding transformer architecture. Specifically attention mechanism

    • @jawokenn8766
      @jawokenn8766 8 หลายเดือนก่อน

      @@giacomobarattini1130its later than you thinj

  • @fvsfn
    @fvsfn 4 หลายเดือนก่อน +42

    I am a math teacher and one of my classes is about AI. I am making watching this mini-series a mandatory requirement. This is just what my students need. Thanks for the exceptional quality of the content on your channel.

    • @StacyMcCabe
      @StacyMcCabe 3 หลายเดือนก่อน +1

      What class and grade do you teach?

    • @fvsfn
      @fvsfn 3 หลายเดือนก่อน +5

      It is a master 2 class on the mathematical foundations of AI.

    • @goatknight777
      @goatknight777 หลายเดือนก่อน +1

      This channel also has a linear algebra course which is also pretty good!

  • @ananthkamath1995
    @ananthkamath1995 2 หลายเดือนก่อน +4

    Your teaching is an incredible way to stimulate my curiosity

  • @parenchyma
    @parenchyma 8 หลายเดือนก่อน +461

    I don't even know how many times I'm going to rewatch this.

    • @chlodnia
      @chlodnia 8 หลายเดือนก่อน

      True

    • @RaoBlackWellizedArman
      @RaoBlackWellizedArman 8 หลายเดือนก่อน +12

      3B1B doens't need to be saved in watch later folder because all his videos are worth watching later.

    • @synthclub
      @synthclub 8 หลายเดือนก่อน +1

      What will you set your weights n biases too?

    • @oofsper
      @oofsper 8 หลายเดือนก่อน

      same

    • @arthurgames9610
      @arthurgames9610 4 หลายเดือนก่อน

      Me fr

  • @yashizuko
    @yashizuko 8 หลายเดือนก่อน +71

    Its astonishing, amazing that this kind of info and explaination quality is available for free, this is way better than a University would explain it

    • @lonnybulldozer8426
      @lonnybulldozer8426 8 หลายเดือนก่อน +1

      Universities are buildings. Buildings can't talk. Therefore, they cannot explain.

  • @lucasamadsen
    @lucasamadsen 8 หลายเดือนก่อน +23

    2 years ago I started studying transformers, backpropagation and the attention mechanism. Your videos were a corner stone for my understanding of those concepts!
    And now, partially thanks to you, I can say: “yeah, relatively smooth to understand”

  • @mahdimoradkhani6610
    @mahdimoradkhani6610 7 หลายเดือนก่อน +32

    The genius in what you do is taking complicated concepts and making them easy to digest. That's truly impressive!

  • @xiangzhang5279
    @xiangzhang5279 8 หลายเดือนก่อน +19

    I have always been blown away by how great your visualization is for explaining ML concepts. Thanks a lot!

  • @lewebusl
    @lewebusl 8 หลายเดือนก่อน +227

    This is heaven for visual learners. Animations are correlated smoothly with the intended learning point ...

    • @gorgolyt
      @gorgolyt 8 หลายเดือนก่อน +21

      There's no such thing as visual learners. Other than the blind, all humans are visual creatures. It's heaven for anyone who wants to learn.

    • @lewebusl
      @lewebusl 8 หลายเดือนก่อน +8

      @@gorgolyt You are right. The human get input from 5 senses , but 90 percent of the brain receptors are directly connected to optical and auditory nerves. That is where the visual dominates the other senses ... For blind people the auditory dominates...

    • @rinkashikachi
      @rinkashikachi 6 หลายเดือนก่อน +5

      @@lewebusl you said an obvious fact and then made a nonsensical bs conclusion out of it. there are no visual learners and it is a proven scientific fact

    • @HydrogenAlpha
      @HydrogenAlpha 5 หลายเดือนก่อน

      @@gorgolyt Yeah Veritasium did an excellent video debunking the pop-science nonsense behind this very commonly held misconception / fake science.

  • @PiercingSight
    @PiercingSight 8 หลายเดือนก่อน +61

    Straight up the best video on this topic. The idea that the dimensions of the embedding space represent different properties of a token that can be applied across tokens is just SO cool!

    • @JonnySolomon
      @JonnySolomon 8 หลายเดือนก่อน +1

      i felt that

    • @MagicGonads
      @MagicGonads 8 หลายเดือนก่อน +1

      orienting and ordering the space (called the 'latent' space) so that the most significant directions come first is called 'principal component analysis' (useful for giving humans the reigns to some degree since we get to turn those knobs and see something interesting but vaguely predictable happen)

    • @andrewdunbar828
      @andrewdunbar828 8 หลายเดือนก่อน

      I agree. I starting writing about that in a comment about 2 seconds into the video before I knew how well he was going to cover it since it's usually glossed over way too much in other introductions to these topics.

  • @keesdekarper
    @keesdekarper 8 หลายเดือนก่อน +194

    This video is gonna blow up. The visualizations will help many people that aren't familiar with NN's or Deep Learning to at least grasp a little bit what is happening under the hood. And with the crazy popularity of LLM's nowadays, this will for sure interest a lot of people

    • @TheScarvig
      @TheScarvig 8 หลายเดือนก่อน +3

      as someone who gave a lot of fellow students lessons in stem field classes i can tell you that the sheer amount of numbers arranged in matrices will immediately shut down the average persons brain...

    • @lesselp
      @lesselp 8 หลายเดือนก่อน

      No, normal people just want to party.

  • @kalashshah6234
    @kalashshah6234 8 หลายเดือนก่อน +7

    This is absolutely one of the best videos for explaining the workings of LLMs. Love the visualisation and the innate ease with which the concepts were explained.
    Hats off!!

  • @ogginger
    @ogginger 8 หลายเดือนก่อน +5

    You are such an AMAZING teacher. I feel like you've really given thought to the learners perception and are kind enough to take the time and address asides and gotchas while you meticulously build components and piece them together all with a very natural progression that's moving towards "something" (hopefully comprehension). Thank you so much for your time, effort, and the quality of your work.

  • @chase_like_the_bank
    @chase_like_the_bank 8 หลายเดือนก่อน +417

    You *must* turn the linguistic vector math bit into a short. -Japan+sushi+germany=bratwurst is pure gold.

    • @XMysticHerox
      @XMysticHerox 8 หลายเดือนก่อน +8

      I am slightly offended it did not result in "Fischbrötchen".

    • @marshmellominiapple
      @marshmellominiapple 8 หลายเดือนก่อน +5

      @@XMysticHerox It was trained in English words only.

    • @XMysticHerox
      @XMysticHerox 8 หลายเดือนก่อน +12

      @@marshmellominiapple ChatGPT supports 95 languages. Not all equally well. But as a German yes it works just as well with german as it does with english.

    • @-Meric-
      @-Meric- 8 หลายเดือนก่อน +6

      @@marshmellominiapple Word2Vec and other vector embeddings of words like glove or whatever don't care about language. They don't "understand" the meaning of the words, they just eventually find patterns in unstructured data to create the embeddings. It works in any language and GPT has a ton of other languages in its training data

    • @stefchristensen47
      @stefchristensen47 8 หลายเดือนก่อน +10

      You can actually try this out in your nearest large language model, like ChatGPT, CoPilot, Gemini, or Mistral. Just ask it to do vector math on the words. Since there isn't a predefined vector word calculus in English, the LLM defaults to just using a version of its own internal representation, and so it can eke out pretty results. I was able to duplicate Hitler - Germany + Italy = Mussolini and sushi - Japan + Germany = sausage (or bratwurst, bother score highly) in GPT-3.5-Turbo Complete.
      It also figured out sushi - Japan + Lebanon = shawarma; sushi - Japan + Korea = kimchi; Hitler - Germany + Spain = Franco; and Hitler - Germany + Russia = Stalin.

  • @Mutual_Information
    @Mutual_Information 8 หลายเดือนก่อน +459

    Grant shows just how creative you can get with linear algebra. Who would have guessed language (?!) was within its reach?

    • @abrokenmailbox
      @abrokenmailbox 8 หลายเดือนก่อน

      Look up "Word2Vec", it's an interestingly explored idea.

    • @Jesin00
      @Jesin00 8 หลายเดือนก่อน +64

      Linear algebra would not be enough, but a nonlinear activation function (even one as simple as max(x, 0)) makes it enough to approximate anything you want just by adding more neurons!

    •  8 หลายเดือนก่อน +10

      Given words are descriptors and numbers are just arbitrarily precise adjectives... aka descriptions...

    • @Mutual_Information
      @Mutual_Information 8 หลายเดือนก่อน +3

      @@Jesin00 Yes, lin alg alone isn't enough.

    • @psychic8872
      @psychic8872 8 หลายเดือนก่อน +1

      Well ML uses linear algebra and he just explains it

  • @shubhamz2464
    @shubhamz2464 8 หลายเดือนก่อน +91

    This series should continue. I thought it was dead after the 4th video. Lots of love and appreciation for your work

    • @The_Quaalude
      @The_Quaalude หลายเดือนก่อน

      These videos probably take a long time to make

  • @jaafars.mahdawi6911
    @jaafars.mahdawi6911 8 หลายเดือนก่อน +7

    Man! You never fail to enlighten, entertain, and inspire us, nor do we get enough of your high-quality, yet very digestible, content! Thank you, Grant!

    • @setarehami23
      @setarehami23 5 หลายเดือนก่อน

      Shame on Ruhollah Khomeini! He destroyed my country. He is a terrorist; a wolf in sheep's clothing.

  • @claudiazeng5668
    @claudiazeng5668 6 หลายเดือนก่อน +20

    I am a non-AI software engineer and I’ve been watching multiple transformer and LLM talks from OpenAI, Stanford online, NLP PhDs, and even some AI founding researchers. Some with code, some with the encoder-decoder diagram, some with Attention is all you need paper, some with ML histories. Still, visualization helps the best when incorporating everything in mind. It’s just beautiful, and love the way you organize the academic terminologies. Salute to this series 100%!

  • @DaxSudo
    @DaxSudo 8 หลายเดือนก่อน +93

    Writing my first academically published paper on AI rn and I have to say as a engineer in this space, this is one of the most complete and well nuanced explanations of these tools. Gold, nay platinum standard for educational content on this topic for decades to come.

  • @Kargalagan
    @Kargalagan 8 หลายเดือนก่อน +80

    I wish i had a friend as passionate as this channel is. It's like finding my family I've always wanted to have

    • @katech6020
      @katech6020 8 หลายเดือนก่อน +3

      I wish the same thing

    • @sumedh-girish
      @sumedh-girish 8 หลายเดือนก่อน +8

      become friends already you both

    • @TheXuism
      @TheXuism 8 หลายเดือนก่อน +1

      here we are 3b1bro now

    • @cagataydemirbas7259
      @cagataydemirbas7259 8 หลายเดือนก่อน +1

      Lets become friends

    • @NishantSingh-zx3cd
      @NishantSingh-zx3cd 7 หลายเดือนก่อน +2

      Be that friend to the younger people in your family.

  • @avishshah2186
    @avishshah2186 8 หลายเดือนก่อน +61

    You made my day!! This topic was taught at my grad school and I needed some intuition today and you have uploaded the video!!! It seems you heard me!!Thanks a ton!! Please upload video of Vision Transformers, if possible

  • @luca_previ0o
    @luca_previ0o 2 หลายเดือนก่อน +2

    I love how clean and natural the transaction from “the difference between men and women is almost the same as the one between all kinds of gender-related words” to “the difference between Italy and Germany is almost the same as the one from the vector representation of a certain couple of very powerful, influent and somewhat worldwide famous moustached-people that lived in those countries in the 1940s” was.
    Being italian myself, this is utterly hilarious, even more than it could have been alone.
    This is a brilliant show of how simple things can be if explained in a very simple way. You hide some details that are tough to explain as they are, building step by step simple analogies that help you through a robust comprehension of the overall topic. And this, my friends, is a brilliant showoff of teaching knowledge at its finests. This man is just perfect for this job. Very good work indeed.

  • @ramanathreyan
    @ramanathreyan 2 หลายเดือนก่อน

    I’ve always sought teachings that succinctly capture the essence of a subject and connect it back to the main point of the story. Often, I’ve felt lost three lectures into a topic, mainly because I couldn’t grasp its core essence. You are the first teacher I’ve encountered who truly accomplishes this. While concepts like math and matrix multiplication are fascinating, understanding their real-world applications-the ‘so what’-is something very few educators have provided throughout my college and graduate studies. I still vividly remember your back-propagation video from several years ago; it has stayed fresh in my mind. I often base my discussion points on it during interviews or conversations with senior engineers. Thank you for everything you’re doing.

  • @punkdigerati
    @punkdigerati 8 หลายเดือนก่อน +22

    I appreciate that you explain tokenization correctly and the usefulness of simplifying it. Many explanations skip all that and just state that the tokens are words.

    • @pw7225
      @pw7225 8 หลายเดือนก่อน +3

      Apart from the fact that tokens CAN actually be longer than a word, too. :) Sub-word token does not mean that tokens must be smaller than a word.

    • @ratvomit874
      @ratvomit874 8 หลายเดือนก่อน

      There is a related idea here in how Roombas navigate houses. They clearly are forming a map of your house in their memory, but there is no guarantee they see it the same way we do i.e. the different zones they see in your house may not correspond nicely to the actual rooms in the house. In the end, though, it doesn't really matter, as long as the job gets done correctly

  • @jerryanyu8467
    @jerryanyu8467 8 หลายเดือนก่อน +10

    Thank you! You're so late 3Blue1Brown, it took me 10 hours of videos + blogs last year to understand what a transformer is! This is the long waited video! I'm sending this to all my friends.

  • @SidharthSisawesome
    @SidharthSisawesome 8 หลายเดือนก่อน +16

    The idea of describing a vector basis as a long list of questions you need to answer is exactly the teaching tool I needed in my kit!! I love that perspective!

  • @skinwalker_
    @skinwalker_ 2 วันที่ผ่านมา

    I can not explain how apperciative I am of this video and how the combination of visuals with the explanation explain what goes into AI. I am a software engeneer with many years this explanation is the best I have ever seen or read. 👍👏

  • @justinjohnson4516
    @justinjohnson4516 หลายเดือนก่อน

    Your ability to add clarity to an incredibly complicated topic, and do it in an efficient way is just incredible. Thank you for your videos.

  • @codediporpal
    @codediporpal 8 หลายเดือนก่อน +22

    18:45 This the the clearest layman explanation of how attention works that I've ever seen. Amazing.

  • @eloyfernandez8668
    @eloyfernandez8668 8 หลายเดือนก่อน +8

    The best video explaining the transformer architecture that I've seen so far... and there are really good videos covering this topic. Thank you!!

  • @TheMuffinMan
    @TheMuffinMan 8 หลายเดือนก่อน +102

    Im a mechanical engineering student, but I code machine learning models for fun. I was telling my girlfriend just last night that your series on dense neural networks is the best to gain an intuitive understanding on the basic architecture of neural networks. You have no idea what a pleasant surprise it was to wake up to this!

    • @baconheadhair6938
      @baconheadhair6938 8 หลายเดือนก่อน

      good man

    • @keesdekarper
      @keesdekarper 8 หลายเดือนก่อน +6

      It doesn't have to be just for fun. I was also in Mechanical Engineering, picked a master in control theory. And now I get to use Deep learning and NN's for intelligent control systems. Where you learn a model or a controller by making use of machine learning

  • @bewaterbewater
    @bewaterbewater 8 หลายเดือนก่อน +5

    This is by far the most organized explanation i've seen about transformers.

  • @Ari_speaks
    @Ari_speaks 2 หลายเดือนก่อน +3

    Your videos are amazing! Anyone that can explain large complex subjects into short, fun, visual smart videos has master the subject. Thank you for sharing your knowledge with the world 🌎

  • @1bird_d
    @1bird_d 8 หลายเดือนก่อน +218

    I always thought when people in the media say, "NO ONE actually understands how chat GPT works" they were lying, but no one was ever able to explain it in layman's terms regardless. I feel like this video is exactly the kind of digestible info that people need, well done.

    • @alexloftus8892
      @alexloftus8892 8 หลายเดือนก่อน +120

      Machine learning engineer here - plenty of people understand how the architecture of chatGPT works on a high level. When people in the media say that, what they mean is that nobody understands the underlying processing that the parameters are using to go from a list of tokens to a probability distribution over possible next tokens.

    • @kevinscales
      @kevinscales 8 หลายเดือนก่อน +77

      It's not a lie, it's just not very precise. No one can tell you exactly why one model decided the next word is "the" while another decided the next word is "a" and in that sense no one understands how a particular model works. The mechanism for how you train and run the model are understood however.

    • @lolololo-cx4dp
      @lolololo-cx4dp 8 หลายเดือนก่อน +7

      ​@@kevinscalesyeah just like any deep ANN

    • @metachirality
      @metachirality 8 หลายเดือนก่อน +47

      Think of it as the difference between knowing how genetics and DNA and replication works vs. knowing why a specific nucleotide in the human genome is adenine rather than guanine.
      There is an entire field of machine learning research dedicated to understanding how neural nets work beyond the architecture called AI interpretability.

    • @KBRoller
      @KBRoller 8 หลายเดือนก่อน +9

      No one fully understands what the learned parameters mean. Many people understand the process by which they were learned.

  • @connorgoosen2468
    @connorgoosen2468 8 หลายเดือนก่อน +8

    This couldn't have come at a better time for me! I'm very excited for this continuation of the series. Thanks Grant!

  • @owenleynes7086
    @owenleynes7086 8 หลายเดือนก่อน +11

    this channel is so good at making math interesting, all my friends think im wack for enjoying math videos but its not hard to enjoy when you make them like this

  • @thedermotify
    @thedermotify หลายเดือนก่อน

    Without doubt, the clearest and most "down to Earth" explanation of embeddings I have come across - amazing work.

  • @jucom756
    @jucom756 8 หลายเดือนก่อน +1

    The funny thing about encoding in a "very high dimentional space" is that we are encoding vector spaces in the rationals, so this high dimentional space could just be represented as a subset of the reals (though it is not a very understandable representation since the UI also processes the representation into a rational approximation).

  • @ai_outline
    @ai_outline 8 หลายเดือนก่อน +50

    We need more Computer Science education like this! Amazing 🔥

    • @examforge
      @examforge 8 หลายเดือนก่อน +5

      Honestly I hope that in future, AI can produce such great content. This will probably tend to take a couple of years more, but I guess its possible. Even better: You got your own Curriculum based on your strengthens and weaknesses. For me this would be a combination of fireship and 3blue1brown content...

  • @StephaneDesnault
    @StephaneDesnault 8 หลายเดือนก่อน +5

    Thank you so much for the immense work and talent that goes into your videos!

  • @JustinLe
    @JustinLe 8 หลายเดือนก่อน +5675

    here's to hoping this is not an April fools

    • @anuragpranav
      @anuragpranav 8 หลายเดือนก่อน +696

      it is - you would be a fool to not watch this video

    • @tinku-n8n
      @tinku-n8n 8 หลายเดือนก่อน +104

      It's 2nd April here

    • @TheUnderscore_
      @TheUnderscore_ 8 หลายเดือนก่อน +22

      @@anuragpranavEven if you already know the subject? 😂

    • @me0101001000
      @me0101001000 8 หลายเดือนก่อน +99

      @@TheUnderscore_ it's never a bad idea to review what you know

    • @anuragpranav
      @anuragpranav 8 หลายเดือนก่อน +72

      @@TheUnderscore_ you are almost certainly limiting what you might know with that approach

  • @looppp
    @looppp 8 หลายเดือนก่อน +2

    The word embedding difference example is.. incredible
    I never thought about it this way
    Thank you so much for this!

  • @krishivsinghal1566
    @krishivsinghal1566 8 หลายเดือนก่อน

    I think what makes these videos so good is just how naturally thought-provoking and inspiring they are

  • @MaxGuides
    @MaxGuides 8 หลายเดือนก่อน +5

    Amazing work, your simple explanations in other videos in this series really helped me get a better understanding of what my masters classes were covering. Glad to see you’re continuing this series! ❤

  • @cone10ceramics
    @cone10ceramics 8 หลายเดือนก่อน +7

    I know the material of this chapter very well. Still, I watched it in its entirety just for the pleasure of watching a masterful presentation, the restful and authoritative cadence of the voice, and the gorgeous animation. Well done, Grant, yet again.

  • @Astronomer6573
    @Astronomer6573 8 หลายเดือนก่อน +4

    Your explanation tends to always be the best! Love how you visualise all these.

  • @samelliott1791
    @samelliott1791 2 หลายเดือนก่อน

    This teaching style with the visuals is so incredible. I cannot describe how thankful I am for them.

  • @timur.shhhhh
    @timur.shhhhh 25 วันที่ผ่านมา +3

    finally this channel has an audio track in another language

  • @RyNiuu
    @RyNiuu 8 หลายเดือนก่อน +4

    ok, you read my mind. From all of the channels, I am so glad, it's you explaining Transformers.

  • @Skyace13
    @Skyace13 8 หลายเดือนก่อน +21

    So you’re telling me computer models can quantify “a few” or “some” based on how close the value is to a given word of a number from its usage from training data?
    I love this

    • @andrewdunbar828
      @andrewdunbar828 8 หลายเดือนก่อน +1

      Well, a bit.

    • @XMysticHerox
      @XMysticHerox 8 หลายเดือนก่อน +7

      Well it can encode any semantic meaning only really limited by the number of parameters and quality of training data.

    • @gpt-jcommentbot4759
      @gpt-jcommentbot4759 8 หลายเดือนก่อน +2

      @@XMysticHerox quantity*

  • @ranajakub
    @ranajakub 8 หลายเดือนก่อน +4

    this is the best series from you by far. excited for its revival

  • @hosamtalbi9740
    @hosamtalbi9740 4 วันที่ผ่านมา

    I watch youtube everyday for hours since 2005 and you're my 3rd subscription, simply amaing!

  • @SreenivasNaalla
    @SreenivasNaalla 4 หลายเดือนก่อน

    Thanks! This is nothing when compared to what has been taught in this channel when institutions charge hefty amount and able to explain the concepts visually

  • @y337
    @y337 8 หลายเดือนก่อน +10

    This guy taught me how to build a neural network from scratch, I was waiting for this video, I even posted a request for it in the subreddit for this channel. I’m very glad this finally exists

  • @CODE7X
    @CODE7X 8 หลายเดือนก่อน +7

    Im in highschool, and i only knew broken pieces of how it works , but you really connected all the pieces together and added the missing ones

  • @LambdaMotivation
    @LambdaMotivation 8 หลายเดือนก่อน +8

    I wish I had you as a teacher. You make math so much more fun than I know it already❤

  • @TusharPhondge
    @TusharPhondge 5 หลายเดือนก่อน +1

    I came across your channel in way of exploring/understanding Bayesian Statistics and was blown away by your 15 mins visual method video. Your way of teaching is amazing and I just couldn't stop on one. Saw your TED talk on Math and how it be engaging by ways of visual story telling where one forgets about the "Where am I going to use this...?" question. I am really glad I ran in your channel.. and now on to exploring many visually engaging stories and learning from you. Thank you for all your work!! 🙏🙏

  • @tldyesterday
    @tldyesterday หลายเดือนก่อน

    It's a tradition of mine to come back to this playlist every once in a while. what a piece of art. Thank you!

  • @scolton
    @scolton 8 หลายเดือนก่อน +7

    Most exciting part of my week by far

  • @davidm2.johnston684
    @davidm2.johnston684 8 หลายเดือนก่อน +8

    Hello 3b1b, I wanted to say a huge thank you for this specific video. This was exactly what I've been needing. Every now and again, I thought to myself, as someone who's been interested in machine learning for my whole adult life, that I should really get a deep understanding of how a transformer works, to the point that I could implement a functional, albeit not efficient, one myself.
    Well, I'm on my way to that, this is at least a great introduction (and knowing your channel I really mean GREAT), and I really wanted to thank you for that!
    I know this is not much, but I'm not in a position to support this channel in a more meaningful way at the moment.
    Anyways, take care, and thanks again!

    • @3blue1brown
      @3blue1brown  8 หลายเดือนก่อน +12

      I'm glad you enjoyed. In case some how you haven't already come across them, I'd recommend the videos Andrej Karpathy does on coding up a GPT. In general, anything he makes is gold.

  • @shaqtaku
    @shaqtaku 8 หลายเดือนก่อน +153

    I can't believe Sam Altman has become a billionaire just by multiplying some matrices

    • @Dr.Schnizzle
      @Dr.Schnizzle 8 หลายเดือนก่อน +42

      You'd be surprised at how many billionaires got there from multiplying some matrices

    • @tiborsaas
      @tiborsaas 8 หลายเดือนก่อน +8

      It's too much reduction, he added value on a higher level. But yeah, when you look deep enough, everything stops looking like magic.

    • @FinnishSuperSomebody
      @FinnishSuperSomebody 8 หลายเดือนก่อน +5

      @@tiborsaas And that is a good thing in many cases, it casts away illogical fears when you understand that there is no any kind of magic or thinking behind this. In practice it is just overhpyed guessing machine what word normally might come after X.

    • @kylev.8248
      @kylev.8248 8 หลายเดือนก่อน

      @@FinnishSuperSomebody this concept comes from 2017. We should actually be very very worried and keeping our eye closely on the progress that AI is making. The amount of progress they have made since the 2017 paper 📝 “Attention is all you need “ is insane.

    • @TheRevAlokSingh
      @TheRevAlokSingh 8 หลายเดือนก่อน

      He doesn’t own any shares in OpenAI. His money is from before

  • @conanf129
    @conanf129 หลายเดือนก่อน

    I graduated with a masters in Machine learning 10 years ago, and wrote a paper on filtering outliers from results of a genetic algorithms implementation. I worked as a software developer since then and now that I am trying to go back to the field, it felt like I had much to catch up. Your videos make my life easier. much thanks!

  • @justchary
    @justchary 8 หลายเดือนก่อน +2

    The quality of these videos and depth of openings of the deeper meaning is simply mind blowing

  • @actualBIAS
    @actualBIAS 8 หลายเดือนก่อน +7

    OH MY GOODNESS
    Your timing is just right! I'm learning about deep neural nets and transformers will be my next topic this week.
    I'M SO EXCITED, I JUST CAN'T HIDE IT!
    I'M ABOUT TO LOSE MY MIND AND I THINK I LIKE IT!

  • @ahmedivy
    @ahmedivy 8 หลายเดือนก่อน +5

    Without watching i can say that this is going to be the best transformers video on yt

    • @robertwiebe
      @robertwiebe 8 หลายเดือนก่อน

      Right you are.

    • @Musthafamum
      @Musthafamum 8 หลายเดือนก่อน

      It is

  • @viola_case
    @viola_case 8 หลายเดือนก่อน +39

    Deep learning is back baby!

    • @kevinscales
      @kevinscales 8 หลายเดือนก่อน +5

      A short 6 year 5 month wait!

  • @alyssachen1297
    @alyssachen1297 6 หลายเดือนก่อน

    Blown away by the elegance - both visually and conceptually - in which this extremely complicated topic was taught! I never comment but was moved to express my sincerest gratitude! Thank you for all the time put into these beautiful videos.

  • @deildegast
    @deildegast 15 วันที่ผ่านมา +1

    Finally an easy to grasp explanation of this, and at a speed that is just right. Thanks !

  • @BobbyL2k
    @BobbyL2k 8 หลายเดือนก่อน +16

    As an ML researcher this is an amazing video ❤. But please allow me to nitpick a little at 21:45
    It’s important to note that while the “un-embedding layer” of a Transformer typically have a different set of weights from the embedding layer, in OpenAI’s GPT model each vector for each word in the un-embedding layer is exactly the same vector as ones in the embedding layer.
    This is not the case for Transformer models that has the output be in a different domain than the input (e.g, translating to a different language), but since the video is specifically talking about GPT. This is the specific of the implementation detailed in the “Improving Language Understanding by Generative Pre-Training” paper by OpenAI.
    The reusing weights make sense here because each the vector from the embedding is a sort of “context free” representation of the word. So there is not need to learn another set of weights.

  • @tomasretamalvenegas9294
    @tomasretamalvenegas9294 8 หลายเดือนก่อน +6

    CHILE MENTIONED 🇨🇱🇨🇱❤️❤️🇨🇱🇨🇱🇨🇱 COME TO SANTIAGO GRANT!!!

  • @bridgeon7502
    @bridgeon7502 8 หลายเดือนก่อน +4

    Hang on, I thought this series was done! I'm delighted!

  • @rolinejohnaguilar5272
    @rolinejohnaguilar5272 8 หลายเดือนก่อน +1

    It amazing that this knowledge is free, really learned a lot from this short session. Definitely will binge watch your videos.

  • @a_desired_turtle
    @a_desired_turtle หลายเดือนก่อน +1

    The first 50 seconds of the video got me super impressed and engaged.

  • @Jackson_Zheng
    @Jackson_Zheng 8 หลายเดือนก่อน +14

    YOU DID IT!
    I emailed you about this video idea about 8 months ago and I've been patiently waiting for you to release this since!

    • @enpassant-d3y
      @enpassant-d3y 8 หลายเดือนก่อน +1

      wow, great idea!

    • @melihozcan8676
      @melihozcan8676 8 หลายเดือนก่อน +5

      YOU DID IT JACKSON! I texted you to email him this idea about 9 months ago. Now the bab- video is there!

  • @Hateusernamearentu
    @Hateusernamearentu 4 หลายเดือนก่อน +3

    Things need clear up. 1.The embedding matrix and unembedding matrix are only used at first and last layer. 2.And these two are trained under supervised. It will not be inside our process of middle layer processes. Embedding matrix is turing text into number, unembedding is doing the opposite way. I thought these two unsupervised. Becoz deep learning is unsupervised. Took me long enough to figure out. Also, 3. embedding matrix is not used for dot product calculation, which is not specfically mentioned, so I was confused for a long long time. The use of embedding matrix is just "search". Like use "loop up" or "map" to find the column vector for a particular word.

    • @undertheshadow
      @undertheshadow 23 วันที่ผ่านมา

      Thanks, I was burning my braincell on how a 12,288D "embedding vector " can be multiplied with a 12,288x50,257 "embedding matrix" without having a huge freakin chunk of absent data.

  • @dhruvshah3909
    @dhruvshah3909 8 หลายเดือนก่อน +5

    I started my deep learning journey from your original videos on deep learning. They inspired me to work in this field. I am about to start my first internship as a researcher in this field. Thank you 3blue1brown for this.

    • @dhruvshah3909
      @dhruvshah3909 8 หลายเดือนก่อน +3

      Also this is the best video that I have seen through my many hundred videos from when I was stuck in tutorial hell on many of these concepts

    •  8 หลายเดือนก่อน

      Just in time to be replaced by them >:).

  • @futuresmkt
    @futuresmkt 2 หลายเดือนก่อน

    The best comprehensive, introductory overview of LLM/AI I have seen! Well done👍🏻

  • @roncho
    @roncho 7 หลายเดือนก่อน +1

    You never cease to amaze me. This is a must watch for any engineer or data scientist. You deserve to be the top one youtube channel. Thank you brother

  • @S8EdgyVA
    @S8EdgyVA 2 หลายเดือนก่อน +6

    Does nobody find it odd that an understanding of language can be created using matrices in a way that is eerily similar to our own?

    • @katielui131
      @katielui131 หลายเดือนก่อน +1

      Think it’s fine because it’s just learning information that exists in the thing inherently? So it has no implication on how it’s similar to humans because we have used systems available to us as tools and engineered a way for them to extract information from the things in the world, ie language in this case. Even if we reach a point where we get a model that models our way of learning/understanding a language perfectly, I think that just means we have created the perfect model? Which I’m not sure if there’s even such a thing(!)

  • @jortand
    @jortand 8 หลายเดือนก่อน +33

    Damit nice April fools joke, I got fooled into learning something.

  • @newxceo
    @newxceo 8 หลายเดือนก่อน +153

    Those who watched more than once gather here 😂

    • @ethanmccormick3271
      @ethanmccormick3271 8 หลายเดือนก่อน +2

      I'm on my first watch but I'll be back

    • @ThaiNguyen-je4gu
      @ThaiNguyen-je4gu 8 หลายเดือนก่อน +2

      This is gold

    • @douglaswolfen7820
      @douglaswolfen7820 2 หลายเดือนก่อน +2

      Yup, I'll need to watch most of these multiple times. But with 3B1B, the animations are so beautiful and the explanations are so good that I'm always happy to rewatch

    • @Wandfigur
      @Wandfigur 27 วันที่ผ่านมา

      Yes!

    • @sandeepaleti-w6g
      @sandeepaleti-w6g วันที่ผ่านมา

      5 times 😂

  • @KCM25NJL
    @KCM25NJL 5 หลายเดือนก่อน

    This my friend, is visualisation heaven! I personally.... like I imagine many people.... struggle to conceptualise the inner workings of machine learning processes. But this right here demystifies so so much in so little time!
    A true benchmark in teaching!

  • @SenseiCC
    @SenseiCC 8 หลายเดือนก่อน

    This channel is so good!!
    The way such complicated topics are broken down and explained is really of the highest standard.
    Please never stop making videos!

  • @André-b3w
    @André-b3w 8 หลายเดือนก่อน +11

    So sad that so many people think AI picks bits of text and images directly from data and just makes a collage...

  • @minds_and_molecules
    @minds_and_molecules 8 หลายเดือนก่อน +6

    The different sampling has to do with the search algorithm, like beam search, or any search involving topk or some tally of probabilities for the final score of the output. Any temperature will not change that the most probable token is the most probable token, so in a greedy search the temperature does not affect the output. This is a very common misconception, I'm a bit disappointed that it was slightly misleading here.

    • @alfredwindslow1894
      @alfredwindslow1894 8 หลายเดือนก่อน +1

      agree, what he said wasn’t logically complete and didn’t really make sense because of it

    • @minds_and_molecules
      @minds_and_molecules 8 หลายเดือนก่อน +1

      To be clear, rest of the video was great!

  • @christianquintili
    @christianquintili 7 หลายเดือนก่อน +4

    This video is an act of democracy. Thank you

  • @inperspective
    @inperspective 6 หลายเดือนก่อน

    I do not know you, and have just stumbled upon your lessons. Though you have undoubtedly heard it often, you are a gifted teacher in both context and visual delivery. Thank you for sharing your gift with all of us.

  • @Tonya-Haines
    @Tonya-Haines 7 หลายเดือนก่อน +1

    The fact that meaning behind tokens is embedded into this 12000 dimensional space, and you get relationships in terms of coordinates and direction, that exists across topics is mind blowing. Like, Japan -> sushi is similar to Germany -> bratwurst is just so darn neat

  • @z-beeblebrox
    @z-beeblebrox 8 หลายเดือนก่อน +5

    3blue1brown released a normal video today. So did Numberphile. So did nearly all the channels in my subsd. There's no wacky bullshit on the google homepage. No stupid gimmick feature in Maps. Have we done it? Have we finally killed off the lamest holiday? Is it finally dead?

  • @adnan7698
    @adnan7698 8 หลายเดือนก่อน +4

    First

  • @guillermoe.sanchezguaida3787
    @guillermoe.sanchezguaida3787 8 หลายเดือนก่อน +3

    There's no blackbox, it's just math

    • @josuecharles9087
      @josuecharles9087 8 หลายเดือนก่อน +3

      The black box here is that the weights can't be individually interpreted. What's the contribution of each weight in the output, given an input? Or How does each weight explain such an output? What's the explicative power of that weight?

  • @johnpuopolo4413
    @johnpuopolo4413 8 หลายเดือนก่อน +1

    You are a genius who teaches extremely well. Thank you for all of your videos and what you give back to the community.