But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 พ.ค. 2024
  • Unpacking how large language models work under the hood
    Early view of the next chapter for patrons: 3b1b.co/early-attention
    Special thanks to these supporters: 3b1b.co/lessons/gpt#thanks
    To contribute edits to the subtitles, visit translate.3blue1brown.com/
    Other recommended resources on the topic.
    Richard Turner's introduction is one of the best starting places:
    arxiv.org/pdf/2304.10557.pdf
    Coding a GPT with Andrej Karpathy
    • Let's build GPT: from ...
    Introduction to self-attention by John Hewitt
    web.stanford.edu/class/cs224n...
    History of language models by Brit Cruise:
    • ChatGPT: 30 Year Histo...
    Paper about examples like the “woman - man” one presented here:
    arxiv.org/pdf/1301.3781.pdf
    ------------------
    Timestamps
    0:00 - Predict, sample, repeat
    3:03 - Inside a transformer
    6:36 - Chapter layout
    7:20 - The premise of Deep Learning
    12:27 - Word embeddings
    18:25 - Embeddings beyond words
    20:22 - Unembedding
    22:22 - Softmax with temperature
    26:03 - Up next
    ------------------
    These animations are largely made using a custom Python library, manim. See the FAQ comments here:
    3b1b.co/faq#manim
    github.com/3b1b/manim
    github.com/ManimCommunity/manim/
    All code for specific videos is visible here:
    github.com/3b1b/videos/
    The music is by Vincent Rubinetti.
    www.vincentrubinetti.com
    vincerubinetti.bandcamp.com/a...
    open.spotify.com/album/1dVyjw...
    ------------------
    3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on TH-cam or otherwise following on whichever platform below you check most regularly.
    Mailing list: 3blue1brown.substack.com
    Twitter: / 3blue1brown
    Instagram: / 3blue1brown
    Reddit: / 3blue1brown
    Facebook: / 3blue1brown
    Patreon: / 3blue1brown
    Website: www.3blue1brown.com

ความคิดเห็น • 2.1K

  • @3blue1brown
    @3blue1brown  หลายเดือนก่อน +832

    Edit: The finalized version of the next chapter is out th-cam.com/video/eMlx5fFNoYc/w-d-xo.html
    Early feedback on video drafts is always very important to me. Channel supporters always get a view of new videos before their release to help inform final revisions. Join at 3b1b.co/support if you’d like to be part of that early viewing group.

    • @bbrother92
      @bbrother92 หลายเดือนก่อน +8

      @3Blue1Brown thanks explaining these things - it is very hard for web programmer to undestand math

    • @JohnSegrave
      @JohnSegrave หลายเดือนก่อน +15

      Grant, this is so good! I've worked in ML about 8 years and this is one of the best descriptions I've seen. Very nicely done. Big 👍👍👍

    • @deker0954
      @deker0954 หลายเดือนก่อน +1

      Is this worth understanding?

    • @bbrother92
      @bbrother92 หลายเดือนก่อน +2

      @@JohnSegrave sir, could you recomend video analysis framework any video description model?

    • @didemyldz1317
      @didemyldz1317 หลายเดือนก่อน +2

      Could you share the name of the model that is used for text-to-speech generation ? Me and my teammate are working on a Song Translator as a senior design project. This might be very helpful. Thanks in advance :)

  • @iau
    @iau หลายเดือนก่อน +1899

    I graduated from Computer Science in 2017. Back then, the cutting edge of ML were Recurrent Neural Networks, in which I based my thesis. This video (and I'm sure the rest of this series) just allowed me to catch up to years of advancements in so little time.
    I cannot describe how important your teaching style is to the world. I've been reading articles, blogs, papers on embeddings and these topics for years now and I never got it quite like I got it today. In less than 30 minutes.
    Imagine a world in which every teacher taught like you. We would save millions and millions of man hours every hour.
    You truly have something special with this channel and I can only wish more people started imitating you with the same level of quality and care. If only this became the standard. You'd deserve a Noble Prize for propelling the next thoustand Nobel Prizes.

    • @lucascorreaaa
      @lucascorreaaa หลายเดือนก่อน +30

      Second that!

    • @kyo250996
      @kyo250996 หลายเดือนก่อน +35

      Same, I did a thesis about vectorize word back in 2017 and no one ever talked about the whole vector of word gives rise to meaning and context when you generate phrases.
      Too bad since noone was interested in ML back then, I leaned on web development and drop the ML :(

    • @iankrasnow5383
      @iankrasnow5383 หลายเดือนก่อน +14

      Funny enough, the other 6 videos in this series all came out in 2017, so you probably didn't miss much.

    • @XMysticHerox
      @XMysticHerox หลายเดือนก่อน +18

      Well transformers were first developed in 2017 so it was the cutting edge exactly when you graduated ^^

    • @rock_sheep4241
      @rock_sheep4241 หลายเดือนก่อน +1

      This is explained in layman terms, but in reality is more complicated than this

  • @DynestiGTI
    @DynestiGTI หลายเดือนก่อน +1894

    Grant casually uploading the best video on Transformers on TH-cam

    • @drgetwrekt869
      @drgetwrekt869 หลายเดือนก่อน +6

      i was expecting froggin electromagnets to be honest :-)

    • @brandonmoore644
      @brandonmoore644 หลายเดือนก่อน +12

      This video was insanely good!

    • @shoam2103
      @shoam2103 หลายเดือนก่อน +4

      Even having a basic understanding of what it is, this was still extremely helpful!

    • @yigitpolat
      @yigitpolat หลายเดือนก่อน +1

      yeah but it did not talk about transformers in this chapter

    • @stefchristensen47
      @stefchristensen47 หลายเดือนก่อน

      I wish I could retweet this post.

  • @tempo511
    @tempo511 หลายเดือนก่อน +546

    The fact that meaning behind tokens is embedded into this 12000 dimensional space, and you get relationships in terms of coordinates and direction, that exists across topics is mind blowing. Like, Japan -> sushi is similar to Germany -> bratwurst is just so darn neat

    • @nctbeats7091
      @nctbeats7091 หลายเดือนก่อน +21

      I actually went berserk when I saw that part of the video, so friggin cool.

    • @amarissimus29
      @amarissimus29 หลายเดือนก่อน

      And it makes the absurdly ham fisted model tampering behind debacles like the Gemini launch look even more absurd. I can hear the troglodytes mobbing in the nth dimension.

    • @dayelu2679
      @dayelu2679 หลายเดือนก่อน +5

      I‘ve come to this realization long time ago then I want to find isomorphic structures of concepts across different disciplines

    • @TheKoekiemonster1234
      @TheKoekiemonster1234 หลายเดือนก่อน

      @@dayelu2679🤓

    • @stefchristensen47
      @stefchristensen47 หลายเดือนก่อน +17

      You can actually try this out in your nearest large language model, like ChatGPT, CoPilot, Gemini, or Mistral. Just ask it to do vector math on the words. Since there isn't a predefined vector word calculus in English, the LLM defaults to just using a version of its own internal representation, and so it can eke out pretty results. I was able to duplicate Hitler - Germany + Italy = Mussolini and sushi - Japan + Germany = sausage (or bratwurst, bother score highly) in GPT-3.5-Turbo Complete.
      It also figured out sushi - Japan + Lebanon = shawarma; sushi - Japan + Korea = kimchi; Hitler - Germany + Spain = Franco; and Hitler - Germany + Russia = Stalin.

  • @keesdekarper
    @keesdekarper หลายเดือนก่อน +170

    This video is gonna blow up. The visualizations will help many people that aren't familiar with NN's or Deep Learning to at least grasp a little bit what is happening under the hood. And with the crazy popularity of LLM's nowadays, this will for sure interest a lot of people

    • @TheScarvig
      @TheScarvig หลายเดือนก่อน +1

      as someone who gave a lot of fellow students lessons in stem field classes i can tell you that the sheer amount of numbers arranged in matrices will immediately shut down the average persons brain...

    • @lesselp
      @lesselp หลายเดือนก่อน

      No, normal people just want to party.

  • @billbill1235
    @billbill1235 หลายเดือนก่อน +1493

    I was trying to understand chatGPT through videos and texts on the Internet. I always said: I wish 3b1b releases a video about it, it's the only way for someone inexperienced to understand, and here it is. Thank you very much for your contributions to youtube!!

    • @lmao8207
      @lmao8207 หลายเดือนก่อน +18

      no even the other videos are kinda meh, even if youre not inexperienced because they dont go in depth, i feel here people get a nice understanding of the concepts captured by the models instead of just the architecture of the models

    • @goldeer7129
      @goldeer7129 หลายเดือนก่อน

      It's kind of true, but if I had to recommend a good place to actually understand transformers and even other machine learning things I would definitely recommend StatQuest, its levels of clearly explaining what's going on are very high. But I'm also very excited to see how 3B1B is going to render all that visually as always

    • @himalayo
      @himalayo หลายเดือนก่อน +2

      I was also just looking into transformers due to their extreme takeover in computer vision!

    • @baconheadhair6938
      @baconheadhair6938 หลายเดือนก่อน +3

      shoulda just asked chatgpt

    • @ironmancloud9759
      @ironmancloud9759 หลายเดือนก่อน +1

      NLP specialization by Andrew covered everything 😅

  • @Silent_Knife
    @Silent_Knife หลายเดือนก่อน +1143

    The return of the legend! This series is continuing, that is the best surprise of TH-cam, thanks Grant, you have no idea how much the young population of academia is indebted to you.

    • @kikiroy5178
      @kikiroy5178 หลายเดือนก่อน +7

      I'm 26, young engineer. Thinking the same. Well said.

    • @youonlytubeonce
      @youonlytubeonce หลายเดือนก่อน +6

      I liked your comment because I'm sure you're right but don't be ageist! 😊 Us olds love him too!

    • @samad.chouihat4222
      @samad.chouihat4222 หลายเดือนก่อน

      Young and seniors alike

  • @lewebusl
    @lewebusl หลายเดือนก่อน +72

    This is heaven for visual learners. Animations are correlated smoothly with the intended learning point ...

    • @gorgolyt
      @gorgolyt หลายเดือนก่อน +5

      There's no such thing as visual learners. Other than the blind, all humans are visual creatures. It's heaven for anyone who wants to learn.

    • @lewebusl
      @lewebusl หลายเดือนก่อน +1

      @@gorgolyt You are right. The human get input from 5 senses , but 90 percent of the brain receptors are directly connected to optical and auditory nerves. That is where the visual dominates the other senses ... For blind people the auditory dominates...

  • @lucasamadsen
    @lucasamadsen หลายเดือนก่อน +10

    2 years ago I started studying transformers, backpropagation and the attention mechanism. Your videos were a corner stone for my understanding of those concepts!
    And now, partially thanks to you, I can say: “yeah, relatively smooth to understand”

  • @parenchyma
    @parenchyma หลายเดือนก่อน +345

    I don't even know how many times I'm going to rewatch this.

    • @chlodnia
      @chlodnia หลายเดือนก่อน

      True

    • @armanahmadian4373
      @armanahmadian4373 หลายเดือนก่อน +6

      3B1B doens't need to be saved in watch later folder because all his videos are worth watching later.

    • @synthclub
      @synthclub หลายเดือนก่อน

      What will you set your weights n biases too?

    • @oofsper
      @oofsper หลายเดือนก่อน

      same

  • @nicholaitukanov1162
    @nicholaitukanov1162 หลายเดือนก่อน +382

    I have been working on transformers for the past few years and this is the greatest visualization of the underlying computation that I have seen. Your videos never disappoint!!

    • @brian8507
      @brian8507 หลายเดือนก่อน +3

      So if we "stop" you... then we avoid judgement day? We should meet for coffee

    • @giacomobarattini1130
      @giacomobarattini1130 หลายเดือนก่อน +14

      ​@@brian8507 "judgement day" 😭

    • @beProsto
      @beProsto หลายเดือนก่อน

      ​@@brian8507 bro's got underlying psychological issues

    • @talharuzgarakkus7768
      @talharuzgarakkus7768 หลายเดือนก่อน

      I agree with you . Visualization is perfect way to understanding transformer architecture. Specifically attention mechanism

    • @jawokenn8766
      @jawokenn8766 หลายเดือนก่อน

      @@giacomobarattini1130its later than you thinj

  • @Kargalagan
    @Kargalagan หลายเดือนก่อน +45

    I wish i had a friend as passionate as this channel is. It's like finding my family I've always wanted to have

    • @katech6020
      @katech6020 หลายเดือนก่อน +2

      I wish the same thing

    • @user-vt4bz2vl6j
      @user-vt4bz2vl6j หลายเดือนก่อน +7

      become friends already you both

    • @TheXuism
      @TheXuism 29 วันที่ผ่านมา +1

      here we are 3b1bro now

    • @cagataydemirbas7259
      @cagataydemirbas7259 29 วันที่ผ่านมา

      Lets become friends

  • @jaafars.mahdawi6911
    @jaafars.mahdawi6911 หลายเดือนก่อน +6

    Man! You never fail to enlighten, entertain, and inspire us, nor do we get enough of your high-quality, yet very digestible, content! Thank you, Grant!

  • @chase_like_the_bank
    @chase_like_the_bank หลายเดือนก่อน +370

    You *must* turn the linguistic vector math bit into a short. -Japan+sushi+germany=bratwurst is pure gold.

    • @XMysticHerox
      @XMysticHerox หลายเดือนก่อน +4

      I am slightly offended it did not result in "Fischbrötchen".

    • @marshmellominiapple
      @marshmellominiapple หลายเดือนก่อน +4

      @@XMysticHerox It was trained in English words only.

    • @XMysticHerox
      @XMysticHerox หลายเดือนก่อน +7

      @@marshmellominiapple ChatGPT supports 95 languages. Not all equally well. But as a German yes it works just as well with german as it does with english.

    • @-Meric-
      @-Meric- หลายเดือนก่อน +2

      @@marshmellominiapple Word2Vec and other vector embeddings of words like glove or whatever don't care about language. They don't "understand" the meaning of the words, they just eventually find patterns in unstructured data to create the embeddings. It works in any language and GPT has a ton of other languages in its training data

    • @stefchristensen47
      @stefchristensen47 หลายเดือนก่อน +9

      You can actually try this out in your nearest large language model, like ChatGPT, CoPilot, Gemini, or Mistral. Just ask it to do vector math on the words. Since there isn't a predefined vector word calculus in English, the LLM defaults to just using a version of its own internal representation, and so it can eke out pretty results. I was able to duplicate Hitler - Germany + Italy = Mussolini and sushi - Japan + Germany = sausage (or bratwurst, bother score highly) in GPT-3.5-Turbo Complete.
      It also figured out sushi - Japan + Lebanon = shawarma; sushi - Japan + Korea = kimchi; Hitler - Germany + Spain = Franco; and Hitler - Germany + Russia = Stalin.

  • @DaxSudo
    @DaxSudo หลายเดือนก่อน +82

    Writing my first academically published paper on AI rn and I have to say as a engineer in this space, this is one of the most complete and well nuanced explanations of these tools. Gold, nay platinum standard for educational content on this topic for decades to come.

    • @hyperadapted
      @hyperadapted หลายเดือนก่อน +3

      Yes. I really hope that he gets some lifetime achievement massive-footprint-in-a-good-sense type of award in the MINT Edu field.

  • @voidemptynull
    @voidemptynull 10 วันที่ผ่านมา +1

    Just brillant.. there is no video in TH-cam that explains concepts in such a clever way

  • @jerryanyu8467
    @jerryanyu8467 หลายเดือนก่อน +8

    Thank you! You're so late 3Blue1Brown, it took me 10 hours of videos + blogs last year to understand what a transformer is! This is the long waited video! I'm sending this to all my friends.

  • @yashizuko
    @yashizuko หลายเดือนก่อน +51

    Its astonishing, amazing that this kind of info and explaination quality is available for free, this is way better than a University would explain it

    • @lonnybulldozer8426
      @lonnybulldozer8426 หลายเดือนก่อน +1

      Universities are buildings. Buildings can't talk. Therefore, they cannot explain.

  • @JustinLe
    @JustinLe หลายเดือนก่อน +5199

    here's to hoping this is not an April fools

    • @anuragpranav
      @anuragpranav หลายเดือนก่อน +616

      it is - you would be a fool to not watch this video

    • @tinkuefu09
      @tinkuefu09 หลายเดือนก่อน +91

      It's 2nd April here

    • @TheUnderscore_
      @TheUnderscore_ หลายเดือนก่อน +19

      @@anuragpranavEven if you already know the subject? 😂

    • @me0101001000
      @me0101001000 หลายเดือนก่อน +79

      @@TheUnderscore_ it's never a bad idea to review what you know

    • @anuragpranav
      @anuragpranav หลายเดือนก่อน +61

      @@TheUnderscore_ you are almost certainly limiting what you might know with that approach

  • @xiangzhang5279
    @xiangzhang5279 หลายเดือนก่อน +9

    I have always been blown away by how great your visualization is for explaining ML concepts. Thanks a lot!

  • @roncho
    @roncho 4 วันที่ผ่านมา

    You never cease to amaze me. This is a must watch for any engineer or data scientist. You deserve to be the top one youtube channel. Thank you brother

  • @Mutual_Information
    @Mutual_Information หลายเดือนก่อน +435

    Grant shows just how creative you can get with linear algebra. Who would have guessed language (?!) was within its reach?

    • @abrokenmailbox
      @abrokenmailbox หลายเดือนก่อน

      Look up "Word2Vec", it's an interestingly explored idea.

    • @Jesin00
      @Jesin00 หลายเดือนก่อน +58

      Linear algebra would not be enough, but a nonlinear activation function (even one as simple as max(x, 0)) makes it enough to approximate anything you want just by adding more neurons!

    •  หลายเดือนก่อน +9

      Given words are descriptors and numbers are just arbitrarily precise adjectives... aka descriptions...

    • @Mutual_Information
      @Mutual_Information หลายเดือนก่อน +3

      @@Jesin00 Yes, lin alg alone isn't enough.

    • @psychic8872
      @psychic8872 หลายเดือนก่อน +1

      Well ML uses linear algebra and he just explains it

  • @1bird_d
    @1bird_d หลายเดือนก่อน +192

    I always thought when people in the media say, "NO ONE actually understands how chat GPT works" they were lying, but no one was ever able to explain it in layman's terms regardless. I feel like this video is exactly the kind of digestible info that people need, well done.

    • @alexloftus8892
      @alexloftus8892 หลายเดือนก่อน +108

      Machine learning engineer here - plenty of people understand how the architecture of chatGPT works on a high level. When people in the media say that, what they mean is that nobody understands the underlying processing that the parameters are using to go from a list of tokens to a probability distribution over possible next tokens.

    • @kevinscales
      @kevinscales หลายเดือนก่อน +73

      It's not a lie, it's just not very precise. No one can tell you exactly why one model decided the next word is "the" while another decided the next word is "a" and in that sense no one understands how a particular model works. The mechanism for how you train and run the model are understood however.

    • @lolololo-cx4dp
      @lolololo-cx4dp หลายเดือนก่อน +7

      ​@@kevinscalesyeah just like any deep ANN

    • @metachirality
      @metachirality หลายเดือนก่อน +45

      Think of it as the difference between knowing how genetics and DNA and replication works vs. knowing why a specific nucleotide in the human genome is adenine rather than guanine.
      There is an entire field of machine learning research dedicated to understanding how neural nets work beyond the architecture called AI interpretability.

    • @KBRoller
      @KBRoller หลายเดือนก่อน +9

      No one fully understands what the learned parameters mean. Many people understand the process by which they were learned.

  • @ogginger
    @ogginger หลายเดือนก่อน +1

    You are such an AMAZING teacher. I feel like you've really given thought to the learners perception and are kind enough to take the time and address asides and gotchas while you meticulously build components and piece them together all with a very natural progression that's moving towards "something" (hopefully comprehension). Thank you so much for your time, effort, and the quality of your work.

  • @tielessin
    @tielessin หลายเดือนก่อน

    It's absolutely ridiculous how many aspects of this topic finally clicked for me in this intro video already. This was incredibly well explained an I'm so thrilled for the next chapters. Thank you very much, Grant!

  • @shubhamz2464
    @shubhamz2464 หลายเดือนก่อน +81

    This series should continue. I thought it was dead after the 4th video. Lots of love and appreciation for your work

  • @PiercingSight
    @PiercingSight หลายเดือนก่อน +56

    Straight up the best video on this topic. The idea that the dimensions of the embedding space represent different properties of a token that can be applied across tokens is just SO cool!

    • @JonnySolomon
      @JonnySolomon หลายเดือนก่อน +1

      i felt that

    • @MagicGonads
      @MagicGonads หลายเดือนก่อน +1

      orienting and ordering the space (called the 'latent' space) so that the most significant directions come first is called 'principal component analysis' (useful for giving humans the reigns to some degree since we get to turn those knobs and see something interesting but vaguely predictable happen)

    • @andrewdunbar828
      @andrewdunbar828 หลายเดือนก่อน

      I agree. I starting writing about that in a comment about 2 seconds into the video before I knew how well he was going to cover it since it's usually glossed over way too much in other introductions to these topics.

  • @TheVirgile27
    @TheVirgile27 หลายเดือนก่อน

    I've been following this high-quality channel for years. And I don't know how it continues to improve over time.
    Thank you for your hard work of popularization of complex notions and your work of aid to intuition with increible visual representations,
    really I take my hat off to you, once more : Thank You

  • @gregburlet6485
    @gregburlet6485 7 วันที่ผ่านมา

    I’m a professional in this field, but always there is so much more to learn and understand. This is so incredibly high level and detailed at the same time, with so much intuition. Masterfully presented, thank you ❤

  • @TheMuffinMan
    @TheMuffinMan หลายเดือนก่อน +97

    Im a mechanical engineering student, but I code machine learning models for fun. I was telling my girlfriend just last night that your series on dense neural networks is the best to gain an intuitive understanding on the basic architecture of neural networks. You have no idea what a pleasant surprise it was to wake up to this!

    • @baconheadhair6938
      @baconheadhair6938 หลายเดือนก่อน

      good man

    • @keesdekarper
      @keesdekarper หลายเดือนก่อน +6

      It doesn't have to be just for fun. I was also in Mechanical Engineering, picked a master in control theory. And now I get to use Deep learning and NN's for intelligent control systems. Where you learn a model or a controller by making use of machine learning

  • @avishshah2186
    @avishshah2186 หลายเดือนก่อน +60

    You made my day!! This topic was taught at my grad school and I needed some intuition today and you have uploaded the video!!! It seems you heard me!!Thanks a ton!! Please upload video of Vision Transformers, if possible

  • @kalashshah6234
    @kalashshah6234 หลายเดือนก่อน +3

    This is absolutely one of the best videos for explaining the workings of LLMs. Love the visualisation and the innate ease with which the concepts were explained.
    Hats off!!

  • @haorancheng4870
    @haorancheng4870 หลายเดือนก่อน +3

    I listened to my professor explaining the crazy equation of softmax for a semester already, and you explained it so well with how temperature also plays a role there. Big RESPECT!

  • @codediporpal
    @codediporpal หลายเดือนก่อน +21

    18:45 This the the clearest layman explanation of how attention works that I've ever seen. Amazing.

  • @SidharthSisawesome
    @SidharthSisawesome หลายเดือนก่อน +13

    The idea of describing a vector basis as a long list of questions you need to answer is exactly the teaching tool I needed in my kit!! I love that perspective!

  • @bigmig4356
    @bigmig4356 หลายเดือนก่อน

    Ooh attention ! Looking forward to it. Thank for feeding our attention span with such quality visualisations. From gradients to signifiant coordinates in concept space your visual language keeps getting more refined and synthetic. The adventure through back propagation was rather mind-bending. This series is amazing.

  • @y337
    @y337 19 วันที่ผ่านมา +7

    This guy taught me how to build a neural network from scratch, I was waiting for this video, I even posted a request for it in the subreddit for this channel. I’m very glad this finally exists

  • @punkdigerati
    @punkdigerati หลายเดือนก่อน +15

    I appreciate that you explain tokenization correctly and the usefulness of simplifying it. Many explanations skip all that and just state that the tokens are words.

    • @pw7225
      @pw7225 หลายเดือนก่อน +3

      Apart from the fact that tokens CAN actually be longer than a word, too. :) Sub-word token does not mean that tokens must be smaller than a word.

    • @ratvomit874
      @ratvomit874 หลายเดือนก่อน

      There is a related idea here in how Roombas navigate houses. They clearly are forming a map of your house in their memory, but there is no guarantee they see it the same way we do i.e. the different zones they see in your house may not correspond nicely to the actual rooms in the house. In the end, though, it doesn't really matter, as long as the job gets done correctly

  • @joaoguerreiro9403
    @joaoguerreiro9403 หลายเดือนก่อน +49

    We need more Computer Science education like this! Amazing 🔥

    • @pythonconsultant
      @pythonconsultant หลายเดือนก่อน +4

      Honestly I hope that in future, AI can produce such great content. This will probably tend to take a couple of years more, but I guess its possible. Even better: You got your own Curriculum based on your strengthens and weaknesses. For me this would be a combination of fireship and 3blue1brown content...

  • @karlstanley8264
    @karlstanley8264 หลายเดือนก่อน +1

    This is genuinely one of the best pieces of science / tech communication I have ever seen. Well done and thank you!

  • @SenseiCC
    @SenseiCC หลายเดือนก่อน

    This channel is so good!!
    The way such complicated topics are broken down and explained is really of the highest standard.
    Please never stop making videos!

  • @connorgoosen2468
    @connorgoosen2468 หลายเดือนก่อน +8

    This couldn't have come at a better time for me! I'm very excited for this continuation of the series. Thanks Grant!

  • @eloyfernandez8668
    @eloyfernandez8668 หลายเดือนก่อน +7

    The best video explaining the transformer architecture that I've seen so far... and there are really good videos covering this topic. Thank you!!

  • @Noriyak1
    @Noriyak1 หลายเดือนก่อน

    Your videos are the only ones I watch multiple times to get the hang of it. But the way you visualize and explain it makes it way more enjoyable and interesting. Love your videos ❤

  • @justchary
    @justchary 24 วันที่ผ่านมา +2

    The quality of these videos and depth of openings of the deeper meaning is simply mind blowing

  • @MaxGuides
    @MaxGuides หลายเดือนก่อน +5

    Amazing work, your simple explanations in other videos in this series really helped me get a better understanding of what my masters classes were covering. Glad to see you’re continuing this series! ❤

  • @Astronomer6573
    @Astronomer6573 หลายเดือนก่อน +4

    Your explanation tends to always be the best! Love how you visualise all these.

  • @johncox9099
    @johncox9099 หลายเดือนก่อน

    Thank you so much for this video. I have watched many on the topic and they teach you some parts, but you connected all the missing dots in a very effective way. Thanks.

  • @50sKid
    @50sKid หลายเดือนก่อน

    Seriously amazing content. The way you present things visually is excellently done.

  • @owenleynes7086
    @owenleynes7086 หลายเดือนก่อน +10

    this channel is so good at making math interesting, all my friends think im wack for enjoying math videos but its not hard to enjoy when you make them like this

  • @StephaneDesnault
    @StephaneDesnault หลายเดือนก่อน +5

    Thank you so much for the immense work and talent that goes into your videos!

  • @jessylikesithard
    @jessylikesithard หลายเดือนก่อน +3

    This is by far the most organized explanation i've seen about transformers.

  • @etiennedud
    @etiennedud 15 วันที่ผ่านมา +1

    The visual of this video are next level, really help the comprehension of the subject

  • @RyNiuu
    @RyNiuu หลายเดือนก่อน +4

    ok, you read my mind. From all of the channels, I am so glad, it's you explaining Transformers.

  • @thelambda5900
    @thelambda5900 หลายเดือนก่อน +8

    I wish I had you as a teacher. You make math so much more fun than I know it already❤

  • @mjmassi11
    @mjmassi11 10 วันที่ผ่านมา

    WOW!!!! I loved this. I learned so much, with just the right amount of mathematical depth for a CS major who last did linear algebra 40 years ago.

  • @pingsunday
    @pingsunday 23 วันที่ผ่านมา

    the amount of work in this video is huge like the mountain of Fuji

  • @user-ew1ic7pr3r
    @user-ew1ic7pr3r หลายเดือนก่อน +6

    I know the material of this chapter very well. Still, I watched it in its entirety just for the pleasure of watching a masterful presentation, the restful and authoritative cadence of the voice, and the gorgeous animation. Well done, Grant, yet again.

  • @dima13693
    @dima13693 หลายเดือนก่อน +3

    I usually gloss over your videos as it gets more technical. But whatever you did this time, kept me hooked the whole time.

  • @mariusfurst4898
    @mariusfurst4898 หลายเดือนก่อน +2

    Your teaching skills are beyond compare. The effort you put into your videos clearly shows.

  • @syedhammad3056
    @syedhammad3056 25 วันที่ผ่านมา

    Just a few weeks back, had trouble understanding what transformers are and I am blessed to see this video pop up, just when I needed it. Thank you for making such amazing videos.

  • @actualBIAS
    @actualBIAS หลายเดือนก่อน +7

    OH MY GOODNESS
    Your timing is just right! I'm learning about deep neural nets and transformers will be my next topic this week.
    I'M SO EXCITED, I JUST CAN'T HIDE IT!
    I'M ABOUT TO LOSE MY MIND AND I THINK I LIKE IT!

  • @Skyace13
    @Skyace13 หลายเดือนก่อน +20

    So you’re telling me computer models can quantify “a few” or “some” based on how close the value is to a given word of a number from its usage from training data?
    I love this

    • @andrewdunbar828
      @andrewdunbar828 หลายเดือนก่อน +1

      Well, a bit.

    • @XMysticHerox
      @XMysticHerox หลายเดือนก่อน +7

      Well it can encode any semantic meaning only really limited by the number of parameters and quality of training data.

    • @gpt-jcommentbot4759
      @gpt-jcommentbot4759 หลายเดือนก่อน +2

      @@XMysticHerox quantity*

  • @Hablo74
    @Hablo74 หลายเดือนก่อน +1

    Wow!! I've no prep in math, analisis, programmig, machie learning, etc... and can grab the concept!
    What a MIRACLE have you done here! Good work! 🤯

  • @jytou
    @jytou หลายเดือนก่อน +1

    Brilliant! Although I already knew pretty much everything presented here, it brings a great visual touch to all the concepts which helps sink those even deeper into the mind. Terrific job as usual! Looking forward to the next chapters of the series!

  • @ranajakub
    @ranajakub หลายเดือนก่อน +4

    this is the best series from you by far. excited for its revival

  • @scolton
    @scolton หลายเดือนก่อน +7

    Most exciting part of my week by far

  • @jorgeromeu
    @jorgeromeu หลายเดือนก่อน +1

    Hi Grant, when I first saw your original 5 deep learning videos I was in my second to last year of high-school, they were my first introduction to ML and deep learning, and they played a part in me choosing to study Computer Science as an undergraduate. Now, five years later I am working on my masters thesis where I am using Vision Transformers :)

  • @scottotterson3978
    @scottotterson3978 15 วันที่ผ่านมา

    The easiest lesson on a hard topic I've ever had. Super clear.

  • @CODE7X
    @CODE7X หลายเดือนก่อน +6

    Im in highschool, and i only knew broken pieces of how it works , but you really connected all the pieces together and added the missing ones

  • @vikrambhutani
    @vikrambhutani หลายเดือนก่อน +3

    I love 3Blue1Brown series ..the linear algebra series was really the state-of-the-art and recognized globally for AI enthusiasts like myself. Now hot topics such as Transformers and GenAI , this is really the best explanation by far. Its short and precise and that's what we want.

  • @alextsun7314
    @alextsun7314 9 วันที่ผ่านมา

    I don't usually comment on videos, but this is one of the best videos I've seen on transformers, extremely detailed but very easy to understand!

  • @theJesai
    @theJesai หลายเดือนก่อน

    thank you so much. as a highschool student who's deeply intrigued by LLMs and deep learning, this was so much better than me trying to interpret the "attention is all you need" paper myself (with LLMs to help, ironically) haha.
    this is hands down the best resource on the transformer architecture and deep learning I've ever found - and I've been through a LOT.
    thank you :)

  • @shaqtaku
    @shaqtaku หลายเดือนก่อน +76

    I can't believe Sam Altman has become a billionaire just by multiplying some matrices

    • @Dr.Schnizzle
      @Dr.Schnizzle หลายเดือนก่อน +16

      You'd be surprised at how many billionaires got there from multiplying some matrices

    • @spanishflea634
      @spanishflea634 หลายเดือนก่อน +1

      Also, gets away with calling it "machine learning".

    • @tiborsaas
      @tiborsaas หลายเดือนก่อน +5

      It's too much reduction, he added value on a higher level. But yeah, when you look deep enough, everything stops looking like magic.

    • @user-gw3yb3ki6w
      @user-gw3yb3ki6w หลายเดือนก่อน +1

      @@tiborsaas And that is a good thing in many cases, it casts away illogical fears when you understand that there is no any kind of magic or thinking behind this. In practice it is just overhpyed guessing machine what word normally might come after X.

    • @kylev.8248
      @kylev.8248 หลายเดือนก่อน

      @@user-gw3yb3ki6w this concept comes from 2017. We should actually be very very worried and keeping our eye closely on the progress that AI is making. The amount of progress they have made since the 2017 paper 📝 “Attention is all you need “ is insane.

  • @davidm2.johnston684
    @davidm2.johnston684 หลายเดือนก่อน +6

    Hello 3b1b, I wanted to say a huge thank you for this specific video. This was exactly what I've been needing. Every now and again, I thought to myself, as someone who's been interested in machine learning for my whole adult life, that I should really get a deep understanding of how a transformer works, to the point that I could implement a functional, albeit not efficient, one myself.
    Well, I'm on my way to that, this is at least a great introduction (and knowing your channel I really mean GREAT), and I really wanted to thank you for that!
    I know this is not much, but I'm not in a position to support this channel in a more meaningful way at the moment.
    Anyways, take care, and thanks again!

    • @3blue1brown
      @3blue1brown  หลายเดือนก่อน +11

      I'm glad you enjoyed. In case some how you haven't already come across them, I'd recommend the videos Andrej Karpathy does on coding up a GPT. In general, anything he makes is gold.

  • @johnpuopolo4413
    @johnpuopolo4413 29 วันที่ผ่านมา +1

    You are a genius who teaches extremely well. Thank you for all of your videos and what you give back to the community.

  • @viola_case
    @viola_case หลายเดือนก่อน +40

    Deep learning is back baby!

    • @kevinscales
      @kevinscales หลายเดือนก่อน +5

      A short 6 year 5 month wait!

  • @gONSOTE
    @gONSOTE หลายเดือนก่อน +3

    SANTIAGO DE CHILE MENTIONED!!! 🗣️🔥🔥🔥 WHAT THE HELL IS CLEAN AIR!!?!?!?!? 🗣️🗣️🗣️🔥🔥

  • @BlayneOliver
    @BlayneOliver หลายเดือนก่อน +1

    the 'embeddings beyond words' segment levelled up my understanding of how machine learning 'thinks'. thank you!

  • @rolinejohnaguilar5272
    @rolinejohnaguilar5272 หลายเดือนก่อน

    It amazing that this knowledge is free, really learned a lot from this short session. Definitely will binge watch your videos.

  • @ahmedivy
    @ahmedivy หลายเดือนก่อน +5

    Without watching i can say that this is going to be the best transformers video on yt

    • @robertwiebe
      @robertwiebe หลายเดือนก่อน

      Right you are.

    • @Musthafamum
      @Musthafamum หลายเดือนก่อน

      It is

  • @bridgeon7502
    @bridgeon7502 หลายเดือนก่อน +4

    Hang on, I thought this series was done! I'm delighted!

  • @asdads3948
    @asdads3948 หลายเดือนก่อน

    Love that bit about the word embeddings and how each direction in that high dimensional space carries some semantic meaning to a certain degree. Haven't heard something i found that interesting in years!

  • @eriktruong9856
    @eriktruong9856 หลายเดือนก่อน

    Thank you so much. We were just learning about transformer architecture in our deep learning course. This was really helpful for visualizing word embeddings!

  • @zmaron1
    @zmaron1 หลายเดือนก่อน +4

    The BEST AI video. Highly recommended !

  • @BobbyL2k
    @BobbyL2k หลายเดือนก่อน +14

    As an ML researcher this is an amazing video ❤. But please allow me to nitpick a little at 21:45
    It’s important to note that while the “un-embedding layer” of a Transformer typically have a different set of weights from the embedding layer, in OpenAI’s GPT model each vector for each word in the un-embedding layer is exactly the same vector as ones in the embedding layer.
    This is not the case for Transformer models that has the output be in a different domain than the input (e.g, translating to a different language), but since the video is specifically talking about GPT. This is the specific of the implementation detailed in the “Improving Language Understanding by Generative Pre-Training” paper by OpenAI.
    The reusing weights make sense here because each the vector from the embedding is a sort of “context free” representation of the word. So there is not need to learn another set of weights.

  • @ayushranjan2494
    @ayushranjan2494 หลายเดือนก่อน

    The amount of depth and efforts invested in this video, makes this channel the best channel on TH-cam.

  • @garlapatiraviteja2104
    @garlapatiraviteja2104 หลายเดือนก่อน

    I initially learned about transformers through text, but watching this video with its awesome gifs really brought the concept to life! Such a visually captivating way to understand complex topics.

  • @Jackson_Zheng
    @Jackson_Zheng หลายเดือนก่อน +13

    YOU DID IT!
    I emailed you about this video idea about 8 months ago and I've been patiently waiting for you to release this since!

    • @user-vb8lx8pi6o
      @user-vb8lx8pi6o หลายเดือนก่อน +1

      wow, great idea!

    • @melihozcan8676
      @melihozcan8676 หลายเดือนก่อน +3

      YOU DID IT JACKSON! I texted you to email him this idea about 9 months ago. Now the bab- video is there!

  • @jortand
    @jortand หลายเดือนก่อน +26

    Damit nice April fools joke, I got fooled into learning something.

  • @zoebluberry1315
    @zoebluberry1315 หลายเดือนก่อน

    And again and again and again in the pool of Million explenations, Yours is the one that makes the topic understandable for everyone who is able to watch and listen. Great work!

  • @shreyasjena474
    @shreyasjena474 หลายเดือนก่อน

    I'm currently working with Transformer models on a regular basis, and still your video helped me look at underlying ideas in a new way. Thanks as always Grant!

  • @tomasretamalvenegas9294
    @tomasretamalvenegas9294 28 วันที่ผ่านมา +4

    CHILE MENTIONED 🇨🇱🇨🇱❤️❤️🇨🇱🇨🇱🇨🇱 COME TO SANTIAGO GRANT!!!

  • @dhruvshah3909
    @dhruvshah3909 หลายเดือนก่อน +5

    I started my deep learning journey from your original videos on deep learning. They inspired me to work in this field. I am about to start my first internship as a researcher in this field. Thank you 3blue1brown for this.

    • @dhruvshah3909
      @dhruvshah3909 หลายเดือนก่อน +3

      Also this is the best video that I have seen through my many hundred videos from when I was stuck in tutorial hell on many of these concepts

    •  หลายเดือนก่อน

      Just in time to be replaced by them >:).

  • @pablorodriguez6318
    @pablorodriguez6318 หลายเดือนก่อน +1

    Grant, this is the best explanation in detail I have ever seen, incredible, thank you for the work you do

  • @looppp
    @looppp หลายเดือนก่อน

    The word embedding difference example is.. incredible
    I never thought about it this way
    Thank you so much for this!

  • @kalin4452
    @kalin4452 หลายเดือนก่อน +6

    Before I clicked on this video, I thought a Transformer was a fictional machine-like species that was being used as toys. Now I know that transformers are much more different. Thanks Grant.

    • @trevinbeattie4888
      @trevinbeattie4888 หลายเดือนก่อน +2

      They’re more than meets the eye. ;)

  • @hstrinzel
    @hstrinzel 19 วันที่ผ่านมา

    BRILLIANT! Absolutely BEST visualization I HAVE EVER SEEN! WoW. THANK YOU!

  • @giacomofumagalli8532
    @giacomofumagalli8532 หลายเดือนก่อน

    Gold resource! Great communication skill! Always a pleasure to listen to. Thank you

  • @henryrugg4971
    @henryrugg4971 หลายเดือนก่อน +7

    I'm a simple man. I see 3B1B has released a new video, I click...