Let's build GPT: from scratch, in code, spelled out.

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 พ.ค. 2024
  • We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!) . I recommend people watch the earlier makemore videos to get comfortable with the autoregressive language modeling framework and basics of tensors and PyTorch nn, which we take for granted in this video.
    Links:
    - Google colab for the video: colab.research.google.com/dri...
    - GitHub repo for the video: github.com/karpathy/ng-video-...
    - Playlist of the whole Zero to Hero series so far: • The spelled-out intro ...
    - nanoGPT repo: github.com/karpathy/nanoGPT
    - my website: karpathy.ai
    - my twitter: / karpathy
    - our Discord channel: / discord
    Supplementary links:
    - Attention is All You Need paper: arxiv.org/abs/1706.03762
    - OpenAI GPT-3 paper: arxiv.org/abs/2005.14165
    - OpenAI ChatGPT blog post: openai.com/blog/chatgpt/
    - The GPU I'm training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand GPU instance in the cloud that you can ssh to: lambdalabs.com . If you prefer to work in notebooks, I think the easiest path today is Google Colab.
    Suggested exercises:
    - EX1: The n-dimensional tensor mastery challenge: Combine the `Head` and `MultiHeadAttention` into one class that processes all the heads in parallel, treating the heads as another batch dimension (answer is in nanoGPT).
    - EX2: Train the GPT on your own dataset of choice! What other data could be fun to blabber on about? (A fun advanced suggestion if you like: train a GPT to do addition of two numbers, i.e. a+b=c. You may find it helpful to predict the digits of c in reverse order, as the typical addition algorithm (that you're hoping it learns) would proceed right to left too. You may want to modify the data loader to simply serve random problems and skip the generation of train.bin, val.bin. You may want to mask out the loss at the input positions of a+b that just specify the problem using y=-1 in the targets (see CrossEntropyLoss ignore_index). Does your Transformer learn to add? Once you have this, swole doge project: build a calculator clone in GPT, for all of +-*/. Not an easy problem. You may need Chain of Thought traces.)
    - EX3: Find a dataset that is very large, so large that you can't see a gap between train and val loss. Pretrain the transformer on this data, then initialize with that model and finetune it on tiny shakespeare with a smaller number of steps and lower learning rate. Can you obtain a lower validation loss by the use of pretraining?
    - EX4: Read some transformer papers and implement one additional feature or change that people seem to use. Does it improve the performance of your GPT?
    Chapters:
    00:00:00 intro: ChatGPT, Transformers, nanoGPT, Shakespeare
    baseline language modeling, code setup
    00:07:52 reading and exploring the data
    00:09:28 tokenization, train/val split
    00:14:27 data loader: batches of chunks of data
    00:22:11 simplest baseline: bigram language model, loss, generation
    00:34:53 training the bigram model
    00:38:00 port our code to a script
    Building the "self-attention"
    00:42:13 version 1: averaging past context with for loops, the weakest form of aggregation
    00:47:11 the trick in self-attention: matrix multiply as weighted aggregation
    00:51:54 version 2: using matrix multiply
    00:54:42 version 3: adding softmax
    00:58:26 minor code cleanup
    01:00:18 positional encoding
    01:02:00 THE CRUX OF THE VIDEO: version 4: self-attention
    01:11:38 note 1: attention as communication
    01:12:46 note 2: attention has no notion of space, operates over sets
    01:13:40 note 3: there is no communication across batch dimension
    01:14:14 note 4: encoder blocks vs. decoder blocks
    01:15:39 note 5: attention vs. self-attention vs. cross-attention
    01:16:56 note 6: "scaled" self-attention. why divide by sqrt(head_size)
    Building the Transformer
    01:19:11 inserting a single self-attention block to our network
    01:21:59 multi-headed self-attention
    01:24:25 feedforward layers of transformer block
    01:26:48 residual connections
    01:32:51 layernorm (and its relationship to our previous batchnorm)
    01:37:49 scaling up the model! creating a few variables. adding dropout
    Notes on Transformer
    01:42:39 encoder vs. decoder vs. both (?) Transformers
    01:46:22 super quick walkthrough of nanoGPT, batched multi-headed self-attention
    01:48:53 back to ChatGPT, GPT-3, pretraining vs. finetuning, RLHF
    01:54:32 conclusions
    Corrections:
    00:57:00 Oops "tokens from the future cannot communicate", not "past". Sorry! :)
    01:20:05 Oops I should be using the head_size for the normalization, not C
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 2.3K

  • @fgfanta
    @fgfanta ปีที่แล้ว +5517

    Imagine being between your job at Tesla and your job at OpenAI, being a tad bored and, just for fun, dropping on TH-cam the best introduction to deep-learning and NLP from scratch so far, for free. Amazing people do amazing things even for a hobby.

    • @crimpers5543
      @crimpers5543 ปีที่แล้ว +146

      he's probably bored at both of those jobs. once people get to high level director positions, they are far removed from the trenches of code. Lots of computer scientists have passion in actually writing and explaining code, not just managing things.

    • @aaronhpa
      @aaronhpa ปีที่แล้ว +99

      and yet people still say socialism isn't viable when most of the great stuff in the internet was done for free/ without expectation of compensation

    • @shyvanatop4777
      @shyvanatop4777 ปีที่แล้ว +104

      @@aaronhpa free market developed the skills but sure man

    • @aaronhpa
      @aaronhpa ปีที่แล้ว +47

      @@shyvanatop4777 did it? I think hard work and dedication by all this people did and not the ability of selling it.

    • @jayakrishnankr7501
      @jayakrishnankr7501 ปีที่แล้ว +67

      @@aaronhpa , it's all about incentives. Why would you ever do anything if you could get anything without effort? In a fictional utopia, socialism might be viable, but human beings don't work like that. For example, this platform came about because of capitalism. I think achieving a balance between the two would be the best. Like building, this platform came from capitalism content here is socialism, maybe something like that.

  • @8LFrank
    @8LFrank ปีที่แล้ว +3859

    Living in a world where a world-class top guy posts a 2-hour video for free on how to make such cutting-edge stuff. I barely started this tutorial but at first I just wanted to say thank you mate!

    • @FobosLee
      @FobosLee ปีที่แล้ว +46

      Wait. It’s him! I didn’t understand at first. Thought it was random IT TH-camr

    • @DavitBarbakadze
      @DavitBarbakadze ปีที่แล้ว +7

      How did it go?

    • @Tinjinladakh
      @Tinjinladakh ปีที่แล้ว +1

      hey jake, what should i do before learn programming, is all basic language is same or different. should i learn only python?

    • @ChrisSmith-lk2vq
      @ChrisSmith-lk2vq ปีที่แล้ว +1

      Totally agree!!

    • @atlantic_love
      @atlantic_love ปีที่แล้ว +12

      "Cutting edge"? The only cutting will be your job. Think before getting your panties all wet. The only people excited for this crap are investors, employers and failed programmers looking for some sort of edge.

  • @jamesfraser7394
    @jamesfraser7394 ปีที่แล้ว +740

    Wow! I knew nothing and now I am enlightened! I actually understand how this AI/ML model works now. As a near 70 year old that just started playing with Python, I am a living example of how effective this lecture is. My humble thanks to Andrej Karpathy for allowing to see into and understand this emerging new world.

    • @user-ks8xf3ie3g
      @user-ks8xf3ie3g 11 หลายเดือนก่อน +64

      Good for you youngster. 75 and will be doing this kind of thing till I drop ... Still run my technology company and doing contract work. Cheers.

    • @mrcharm767
      @mrcharm767 11 หลายเดือนก่อน +9

      what makes u learn these at age of 70?

    • @jamesfraser7394
      @jamesfraser7394 11 หลายเดือนก่อน +17

      @@mrcharm767 Want to analyze more stocks , the way I would, in a shorter time. ;)

    • @fawzishafei5565
      @fawzishafei5565 11 หลายเดือนก่อน +6

      @@mrcharm767 The sky is the limit.....!

    • @fmailscammer
      @fmailscammer 11 หลายเดือนก่อน +9

      I’m always excited to learn new things, hope I’m still learning at 70!

  • @BAIR68
    @BAIR68 5 หลายเดือนก่อน +213

    I am a college professor and learning GPT from Andrej. Every time I watch this video, I not only I learn the contents, also how to deliver any topic effectively. I would vote him as the "Best AI teacher in TH-cam”. Salute to Andrej for his outstanding lectures.

    • @noadsensehere9195
      @noadsensehere9195 4 หลายเดือนก่อน

      which university?

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 3 หลายเดือนก่อน

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

    • @ocanehauncanedichieilcane
      @ocanehauncanedichieilcane 20 ชั่วโมงที่ผ่านมา

      please don't

  • @softwaredevelopmentwiththo9648
    @softwaredevelopmentwiththo9648 ปีที่แล้ว +1814

    Thank you for taking the time to create these lectures. I am sure it takes a lot of time and effort to record and cut these. Your effort to level up the the community is greatly appreciated. Thanks Andrej.

    • @davidananias8239
      @davidananias8239 ปีที่แล้ว +7

      Emphasis on appreciation.

    • @photosone7160
      @photosone7160 ปีที่แล้ว +3

      ditto 🙂

    • @evanacharya4153
      @evanacharya4153 ปีที่แล้ว +5

      Thank you

    • @MarkTimeMiles
      @MarkTimeMiles ปีที่แล้ว +3

      🙏 You're 🙏 a 🙏 mensch 🙏 Andrej 🙏💪

    • @DennisXiloj
      @DennisXiloj ปีที่แล้ว +3

      Thank you! for real. You are an awesome person Andrej.

  • @amazedsaint
    @amazedsaint ปีที่แล้ว +860

    All other youtube videos: There is this amazing thing called ChatGPT
    Andrej: Hold my beer 🍺
    Seriously - we really appreciate your time and effort to create this Andrej. This will do a lot of good for humanity - by making the core concepts accessible to mere mortals.

    • @syedshoaibshafi4027
      @syedshoaibshafi4027 ปีที่แล้ว

      u can do it more easily using lstm

    • @zuu2051
      @zuu2051 ปีที่แล้ว +15

      @@syedshoaibshafi4027 do you really saying that out and loud. dude is still living in 2010 🤣

    • @kevinremmy5812
      @kevinremmy5812 ปีที่แล้ว

      lit😅

    • @redsnflr
      @redsnflr ปีที่แล้ว +1

      Mere mortals with at least basic programming and python knowledge, but yes.

    • @kemalware4912
      @kemalware4912 ปีที่แล้ว +1

      🍺

  • @rafaelsouza4575
    @rafaelsouza4575 ปีที่แล้ว +213

    I was always scared of Transformer's diagram. Honestly, I never understood how such schema could make sense until this day when Andrej enlightened us with his super teaching power. Thank you so much! Andrej, please save the day again by doing one more class about Stable Diffusion!! Please, you are the best!

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 3 หลายเดือนก่อน

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @fslurrehman
    @fslurrehman ปีที่แล้ว +100

    I knew only python, math and definitions of NN, GA, ML and DNN. In 2 hours, this lecture has not only given me the understanding of GPT model, but also taught me how to read AI papers and turn them into code, how to use pytoch, and tons of AI definitions. This is the best lecture and practical application on AI. Because it not only gives you an idea of DNN, but also give you code directly from research papers and a final product. Looking forward to more lectures like these. Thanks Andrej Karpathy.

  • @gokublack4832
    @gokublack4832 ปีที่แล้ว +238

    Wow! Having the ex-lead of ML at Tesla make tutorials on ML is amazing. Thank you for producing these resources!

    • @SzTz100
      @SzTz100 ปีที่แล้ว +7

      I know, I couldn't believe it.

    • @VultureGamerPL
      @VultureGamerPL ปีที่แล้ว +17

      Can you believe it? God bless this man and I'm not even religious!

    • @cane870
      @cane870 ปีที่แล้ว +4

      @@VultureGamerPL cringe

    • @lookupverazhou8599
      @lookupverazhou8599 ปีที่แล้ว +11

      @@cane870 Cope.

    • @learnomics
      @learnomics 19 วันที่ผ่านมา

      @@VultureGamerPL No only ex-lead of ML at Tesla. He is also cofounder of OpenAi

  • @JainPuneet
    @JainPuneet ปีที่แล้ว +820

    Andrej, I cannot comprehend how much effort you have put in making these videos. Humanity is thankful to you for making these publically available and educating us with your wisdom. One thing is to know the stuff and apply it in corp setting and another thing is to use that instead to educate millions for free. This is one of the best kind of charity a CS major can do. Kudos to you and thank you so much for doing this.

    • @vicyt007
      @vicyt007 ปีที่แล้ว +8

      Making this video is super simple for a specialist like him. It’s like creating a Hello World program for a computer scientist.

    • @JainPuneet
      @JainPuneet ปีที่แล้ว +30

      @@vicyt007 I beg to differ. I am from the area and I can imagine how much time he must have spent offline to come up with the right abstraction.

    • @vicyt007
      @vicyt007 ปีที่แล้ว +1

      @@JainPuneet I agree that it took him some time to make this video, but I don’t believe it was a tough task.

    • @hpmv
      @hpmv ปีที่แล้ว +15

      @@vicyt007 People who has expertise in an area aren't always good teachers. Being able to show others how it works in an organized, easy-to-understand manner is very tricky. On the surface it looks easy, but if you try doing a video like this yourself, chances are you'll find it much harder than you think.

    • @vicyt007
      @vicyt007 ปีที่แล้ว +1

      @@hpmv I Know it was not an easy task but at least he knows what he is saying, it’s just a matter of explaining concepts. He was a teacher for a long time, then it’s his job, that he is doing for free here !
      But in my opinion, this video did not target people with 0 knowledge in maths / ML / IA / Python, because in this case you must admit that it is quite hard to understand. But it was watched by nearly 2M people. Those people are not skilled correctly to understand. Briefly, I think that this video targeted skilled people but was watched by anybody. Why not ?

  • @user-co4op9ok4b
    @user-co4op9ok4b 10 หลายเดือนก่อน +32

    I cannot thank you enough for this material. I've been a spoken language technologist for 20 years and this plus your micro-grad and make more videos has given me a graduate level update in less than 10 hours. Astonishingly well-prepared and presented material. Thank you.

  • @aojiao3662
    @aojiao3662 5 หลายเดือนก่อน +15

    Most clear and intuitive and well explained transformer video I've ever seen. Watched it as if it were a tv show and that's how down-to-earth this video is. Shoutout to the man of legend.

  • @antopolskiy
    @antopolskiy ปีที่แล้ว +143

    It is difficult to comprehend how lucky we are to have you teaching us. Thank you, Andrej.

  • @meghanaiitb
    @meghanaiitb ปีที่แล้ว +61

    What a feeling ! Just finished sitting on this for the weekend, building along and finally understanding Transformers. More than anything, a sense of fulfilment. Thanks Andrej.

  • @rangilanaoermajhi1820
    @rangilanaoermajhi1820 ปีที่แล้ว +13

    Just gone through all of his videos - MLP, Gradients and of course the backprop :), and finally finishing with the transformer model (decoder part). As we all know Andrej is the hero of deep learning and we are very much blessed to get this much of rich contents for free in TH-cam, also from a teacher like him. Fascinating staff from a fascinating contributor in the field of AI 🙏

  • @Grey_197
    @Grey_197 11 หลายเดือนก่อน +9

    Broke my back just to finish this video in single sitting. Its a lot to take at once, i think I'll have to implement it bit by bit in a span of day to actually assimilate everything.
    I am very happy from the lecture/tutorial, waiting for more. Time and effort in making this video possible is highly admirable and respectable.
    Thank you Andrej.

  • @yusufsalk1136
    @yusufsalk1136 ปีที่แล้ว +530

    The best notification ever.

    • @ninadgandhi9040
      @ninadgandhi9040 ปีที่แล้ว +3

      Indeed!

    • @TTTrouble
      @TTTrouble ปีที่แล้ว +4

      Literally took the words out of my mouth. It’s been a while since I’ve instaclicked and watched a 2hr long video. Very much worth it.

    • @andrewm4894
      @andrewm4894 ปีที่แล้ว +1

      Ohhhj sheeeeet, clear my schedule!

    • @Shaunmcdonogh-shaunsurfing
      @Shaunmcdonogh-shaunsurfing ปีที่แล้ว +1

      Absolutely agree

    • @ChrisSmith-lk2vq
      @ChrisSmith-lk2vq ปีที่แล้ว

      True!!

  • @JoseLopez-ox7sq
    @JoseLopez-ox7sq ปีที่แล้ว +183

    This is simply fantastic. I think it would be beneficial for people learning to see the actual process of training, the graphs in W&B and how they can try to train something like this.

    • @AndrejKarpathy
      @AndrejKarpathy  ปีที่แล้ว +183

      makes sense, potentially the next video, this one was already getting into 2 hours so I wrapped things up, would rather not go too much over movie length.

    • @jdejota1029
      @jdejota1029 ปีที่แล้ว +72

      @@AndrejKarpathy Please don't bother to be over movie length, I enjoyed every minute of the video. It's the first time I attended a in depth class of what's under the hood of a model.

    • @nikitaandriievskyi3448
      @nikitaandriievskyi3448 ปีที่แล้ว +22

      @@AndrejKarpathy I think people would watch these videos even if they were 10 hours long, so don't worry about making them too long :)

    • @patpearce8221
      @patpearce8221 ปีที่แล้ว +8

      @@AndrejKarpathy don't listen to these sycophants. Size matters.

  • @thegrumpydeveloper
    @thegrumpydeveloper ปีที่แล้ว +7

    So happy to see Andrej back teaching more. His articles before Tesla were so illuminating and distilled complicated concepts into things we could all learn from. A true art. Amazing to see videos too.

  • @I_am_who_I_am_who_I_am
    @I_am_who_I_am_who_I_am หลายเดือนก่อน +39

    I did something like this in 1993. I took a ling text and calculated the probability of one word (i worked with words, not tokens) being after another by parsing the full text.
    And I successfully created a single layer perceptron parrot which can spew almost meaningful sentences.
    My professors told me I should not pursue the neural network path because it's practically abandoned. I never trusted them. I'm glad to see neural networks' glorious comeback.
    Thank you Andrej Karpathy for what you have done for our industry and humanity by popularizing this.

  • @zechordlord
    @zechordlord ปีที่แล้ว +12

    Thanks so much for making this! I could grasp about 80% of everything with my programming/little bit of university-level machine learning background, but it does not feel like magic anymore. This format of hands-on coding along with the thought process behind it is way better than reading a paper and trying to piece things together.

  • @Marius12358
    @Marius12358 ปีที่แล้ว +24

    I'm enjoying this whole series so much Andrej. They make me understand neural networks much better then anything so far in my Bachelor. As an older student that has a large incentive to be time efficient, this has been a gold send. Thank you so much!! :D

  • @ProductivityMo
    @ProductivityMo ปีที่แล้ว +23

    Thank you Andrej! I can't imagine the amount of time and effort it took to put this 2 hour video together! Very very educational in breaking down how GPT is constructed. Would love to see a follow-up on tuning the model to answer questions on small scale!

  • @ShihgianLee
    @ShihgianLee ปีที่แล้ว +12

    This lecture answers ALL my questions from the 2017 Attention Is All You Need paper. I am alway curious about the code behind Transformer. This lecture quenched my curiosity with a colab to tinker with. Thank you so much for your effort and time in creating the lecture to spread the knowledge!

  • @NicholasRenotte
    @NicholasRenotte ปีที่แล้ว +84

    This is AMAZING! You're an absolute legend for sharing your knowledge so freely like this Andrej! I'm finally getting some time to get into transformer architectures this is a brilliant deep dive, going to spend the weekend walking through it!! Thank you🙏🏽

    • @varunahlawat9013
      @varunahlawat9013 ปีที่แล้ว +1

      Waiting for your take on this too!

    • @eliotharreau7627
      @eliotharreau7627 ปีที่แล้ว +1

      Hi Nicholas , I dont understand all this code . I just have one question is it working ?? And is it like ChatGPT ? Thnx Bro.

    • @kyriakospelekanos6355
      @kyriakospelekanos6355 ปีที่แล้ว +1

      @@eliotharreau7627 This is a demonstration of HOW chatgpt works

    • @eliotharreau7627
      @eliotharreau7627 ปีที่แล้ว

      @@kyriakospelekanos6355 I think it is not only how ChatGPT work. But it s a code hoe can do LIKE ChatGPT. That's why I m surprise !!! Thank you anyway.

    • @satoshinakamoto5710
      @satoshinakamoto5710 ปีที่แล้ว

      bro can't wait for your video on this!

  • @lkothari
    @lkothari ปีที่แล้ว +7

    This was incredible Andrej! Really appreciate how you intersperse teaching a concept with coding and building step-by-step. This is the first of your videos that I have watched and I can't wait to watch all the others.

  • @mmedina
    @mmedina ปีที่แล้ว +3

    Just wanted to thank you for your efforts. The video is great! Clear, concise, and very understandable. The way you start from scratch, and little by little start building every block of the paper is just awesome. Thank you very much!

  • @nazgulizm
    @nazgulizm 10 หลายเดือนก่อน +5

    Thank you for taking the time and effort to share this, Andrej! This is of great help to lift the veil of abstractions that made it all seem inaccessible and opening up that world to ML/AI uninitiated like me. I don’t understand all of it yet but I’m now oriented and you’ve given me a lot of threads I can pull on.

  • @curatorsshelf393
    @curatorsshelf393 ปีที่แล้ว +5

    Andrej, Thank you so much for sharing your knowledge and expertise. I've been following your video series and it has been truly amazing. I remember you were saying in one of the interviews that to prepare 1hour video, it takes more than 10hrs. I cannot thank you enough for what you are doing!

  • @miladaghajohari2308
    @miladaghajohari2308 ปีที่แล้ว +4

    These videos are awesome. It has been 3 years that I am doing DL research but the way you explain things is so pleasing that I sit through the whole 2 hours. Kudos to you Andrej.

  • @IllIl
    @IllIl ปีที่แล้ว +7

    Dude, thank you so much for this. It was a seriously awesome dive into the implementation with great explanations along the way. I've read/watched a lot of ML content and this has got to be one of the clearest lectures I've come across - even better than the usual famous online uni lectures. Thank you! (And I'll be rewatching it too! :)

  • @lipingxiong1376
    @lipingxiong1376 ปีที่แล้ว +27

    Thank you so much for creating such valuable content. A few years ago, I watched your 2016 Stanford computer vision course, which was instrumental in helping me understand backpropagation and other important neural network concepts. Andrew Ng's courses initially led me into the world of machine learning, but I find your videos to be equally educational, focused on fundamental concepts, and presented in a very accessible way. I've also been following your blog and was thrilled to learn about your new TH-cam channel. Your dedication to creating these resources is truly appreciated.
    Growing up in rural China, I didn't have many opportunities to learn outside of textbooks. But now, thanks to people like you, I find myself swimming in a sea of knowledge. Thank you for making such a significant impact on my learning journey.
    BTW, I edited this with chatGPT to make me sounds more like a native speaker. :)

    • @eva__4380
      @eva__4380 ปีที่แล้ว +3

      Similar experience here . I too watched Stanfords computer vision and Nlp and a few other courses a while back. I also did lectures of linear algebra,calc, probability stats etc from mit ocw to have a strong grasp of the fundamentals . Without TH-cam it wouldn't be possible for me to have access to such high quality education

    • @raghulponnusamy9034
      @raghulponnusamy9034 7 หลายเดือนก่อน

      can you please share me that link @eva__4380

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 3 หลายเดือนก่อน

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @rcuzzy
    @rcuzzy ปีที่แล้ว +90

    Andrej, I know there is probably a million other things you could be working on or efforts you could put your mind towards, but seriously thank you for these videos, they are important, they matter, and are providing many of us with a foundation of which to learn, build. and understand A.I. from and how to develop these models further. Thank you again and please keep doing these

    • @reinhodl7377
      @reinhodl7377 ปีที่แล้ว +3

      Seriously, Andrej is just so very kind in his way of explaining things. His shakespeare LSTM article way back ("The Unreasonable Effectiveness of Recurrent Neural Networks") was what got me seriously into ML in the first place. And while i've since (professionally) moved to different development work unrelated to ML/AI, this is the exact kind of thing that hooks me back in. Andrej knows people watching this are not idiots and doesn't treat them as such, but at the same time fully understands how opaque even basic AI concepts can be if all you ever really interact with is pre-trained models. There's tons of value in explaining this stuff in such a practical way.

  • @PrakharSrivastav
    @PrakharSrivastav ปีที่แล้ว +10

    Truly phenomenal to live in an age where we can learn all this for free from experts like you. Thank you so much Andrej for your contribution. What a gift you have given.

  • @nikolaMKD95
    @nikolaMKD95 ปีที่แล้ว +92

    Wow. I thought you gonna use the Transformer library but you essentially build the entire transformer architecture from scratch. Well done!!

    • @gokulakrishnanr8414
      @gokulakrishnanr8414 2 หลายเดือนก่อน

      Thanks! Yeah, it was a fun challenge building the Transformer from scratch. Glad you're enjoying the video!

  • @karanacharya18
    @karanacharya18 ปีที่แล้ว +4

    Absolutely amazing lecture. Thank you so much Andrej! I finally understand Attention and Transformers. "Code is the ultimate truth". And the way to set the stage and explain the concepts and the code is brilliant.

  • @rockapedra1130
    @rockapedra1130 ปีที่แล้ว +8

    This is fantastic. I am amazed that Andrej takes so much of his time to impart this incredibly valuable knowledge for free to all and sundry. He is not only a top researcher but also a fantastic communicator. We have gotten used to big corporations hoarding knowledge and talent to become exploitative monopolies but every so often, humanity puts forth a gem like Mr. Karpathy to keep us all from going head first into the gutter. Thank you!!!

  • @muhajerAlSabil1
    @muhajerAlSabil1 ปีที่แล้ว +14

    "Andrej , your willingness to share your knowledge and insights on TH-cam is truly inspiring. Your passion for teaching and helping others understand complex concepts is evident in your videos, and it's clear that you have a drive to make a positive impact in the field of AI. Keep up the amazing work, and thank you for making this knowledge accessible to all!" ps this comment was generated using GPT

  • @sr3090
    @sr3090 ปีที่แล้ว +1

    Thank you Andrej for this wonderful session. I a tech enthusiast and wanted to understand how GPT works and came across your video. I have always found the research papers difficult to comprehend and never understood how they actually get implemented. Your video completely changed that. You are such a good teacher and make things so easy to understand. Your fan club just got a new member!! :)

  • @footfunk510
    @footfunk510 11 หลายเดือนก่อน

    This was amazing. Thank you, Andrej! I've read about the transformer architecture but watching this code walk-through really helped me understand what this looks like in an applied way. Pulling together code and the paper helped bring the theory and practice together.

  • @coemgeincraobhach236
    @coemgeincraobhach236 ปีที่แล้ว +10

    Day 2 of implementing this down, about one more evening to go I think. Thanks so much for this! I spent so long down the rabbit hole of CNNs that its really refreshing to try a completely different type of model. No way I could have done it without a lecture of this quality! Legend

  • @pastrop2003
    @pastrop2003 ปีที่แล้ว +3

    Thank you, Andrej, this is awesome! This is the best hands-on tutorial on the transformer-based language model I ever came across. It is very gracious of you to share your knowledge and experience.

  • @sampsonleo7475
    @sampsonleo7475 7 วันที่ผ่านมา

    This is truly a step-by-step tutorial for building a Transformer system. So impressive by the way you teach! Very clear and very easy to follow. You are a highly talented educator!

  • @artukikemty
    @artukikemty ปีที่แล้ว +15

    Amazing, watching these videos I can still believe in human kind, seeing a guy like Andrej sharing his knowledge and his time with the rest of the world is something that we do not see every day. Thanks for posting it!

    • @jwalk121
      @jwalk121 ปีที่แล้ว

      He's a very good teacher, but there are still islands

  • @redfordkobayashi6936
    @redfordkobayashi6936 ปีที่แล้ว +5

    You just know someone has a deep grasp on the subject matter when they start dishing out "build X from scratch" on a regular basis. Thank you Karpathy for sharing your knowledge with the world. You are more than amazing.

  • @rw-kb9qv
    @rw-kb9qv ปีที่แล้ว +7

    I think this style of teaching is much better than a lecture with powerpoint and whiteboard. This way you can actually see what the code is doing instead of guessing what all the math symbols mean. So thank you very much for this video!

    • @13thbiosphere
      @13thbiosphere ปีที่แล้ว

      By 2030 will be the dominant method of learning..... Varsity more efficient..... Any University failing to embrace this method will crumble

  • @scottsun345
    @scottsun345 ปีที่แล้ว +2

    Wow, this video and everything it covered are just amazing! There are no other words except, thank you, Andrej, for all the efforts it took to make this! Really look forward to more of your great ideas and contents!

  • @armaankhokhar7651
    @armaankhokhar7651 ปีที่แล้ว +2

    Your playlist has been instrumental to my learning and incredibly motivating. Please keep posting!

  • @chung-shienwang6248
    @chung-shienwang6248 ปีที่แล้ว +12

    Can't be more grateful. We're literally living in the best of times because of you! Thank you so much

  • @HazemAzim
    @HazemAzim ปีที่แล้ว +3

    WoW .. Very comprehensive and smooth . You went through almost every detail in an excellent educational manner . This surely needed a lot of effort . I have seen many videos on transformers some of them are really very good in explaining the concepts and the math behind. but in terms of SW implementation , on how transformers work from a code perspective , this is by far the best I have seen . Thank you

  • @matteofogliata21
    @matteofogliata21 7 หลายเดือนก่อน

    I've just started approaching the transformer architecture in the last two days, and I think this is by far the best explanation. It's well thought, giving all the hints, intuitions and demonstrations with simple code. Thank you Andrej!

  • @Kirby-Bernard
    @Kirby-Bernard 3 หลายเดือนก่อน

    We are grateful that talented people like you believe in teaching and helping! This is an amazing video. Clear, precise, brings out a tough topic to a layperson! So much to learn on how to make technical videos.

  • @khalobert1588
    @khalobert1588 ปีที่แล้ว +40

    I think this man is a singularity, because the world has not seen such a combination of talent and good character. Thanks mate 🙏

  • @ayushsrivastava3879
    @ayushsrivastava3879 ปีที่แล้ว +3

    Thank you for taking the time to create these lectures.
    I'll be the first to buy if you ever want to do a subscription plan.
    Honestly, I learned so much more from this playlist alone than from any other documentation or blogs combined.
    Working with NLP is now entirely different for me.
    I'll work hard to work with you one day.

  • @aureliencobb199
    @aureliencobb199 ปีที่แล้ว +8

    Giving us these lectures for free. I do not know how to thank you. Great job explaining to us NN so clearly.

  • @realaliarain
    @realaliarain ปีที่แล้ว +3

    A bundle of thanks for this one.
    This mean so much for us. The community is thankful to you.
    Taking time for us to actually record this masterpieces.
    Thanks Andrej

  • @SchultzC
    @SchultzC ปีที่แล้ว +4

    From CS231n and RL Pong to this… there is something special about the way you beak down and explain things. I have benefited immensely from it and I’m obviously not the only one. Thank You!

  • @michaeldimattia9015
    @michaeldimattia9015 4 หลายเดือนก่อน

    MIND = BLOWN! Not only is this incredible content, but the way everything was presented, coded, and explained is so crystal clear, my mind felt comfortable with the complexity. Amazing tutorial, and incredibly inspiring, thanks so much!

  • @petervogt8309
    @petervogt8309 ปีที่แล้ว +1

    Nothing new in this comment. Just want to say 'thank you!' for this amazing tutorial, ...and all the others! The completeness, the information density and pace, the choice of examples and language.... Everything is *just right* , delivered right from the heart and the mind!! Thank you so much Andrej, for taking your time to educate and inspire all of us.

  • @juxyper
    @juxyper ปีที่แล้ว +6

    I have some experience in understanding the maths behind all this stuff but I kind of had problems with advancing to creating and training models, these videos are a godsend. Big thanks

  • @WannabeALU
    @WannabeALU ปีที่แล้ว +46

    I don't have words to describe how grateful I am to you and the work you are doing. Thank you!

    • @klauszinser
      @klauszinser ปีที่แล้ว +3

      The world has got a very good teacher back. Very appreciated.

    • @RKELERekhaye
      @RKELERekhaye ปีที่แล้ว

      Fantastic video Andre, your the best and so nice.😊

  • @nicorauseo5478
    @nicorauseo5478 2 หลายเดือนก่อน

    Just finished all the lectures so far from the makemore series to this one and my knowledge grew drastically. I went from a stage of knowing the basics to be really comfortable to look under the hood. Definitely going to use this knowledge now to build useful projects. Thank you so much and i'm exited to keep learning form you. 🔥🔥🔥

  • @DigitalMirrorComputing
    @DigitalMirrorComputing 16 วันที่ผ่านมา

    You've inspired me to start my own channel. This is the best video I've seen all year! Thank you for your contributions and for continuing to inspire us!

  • @haleemaramzan
    @haleemaramzan ปีที่แล้ว +4

    I built this same thing alongside watching the lecture, and loved it! I'm trying to get better at understanding and coding these concepts, and this was extremely helpful. Thank you so much :)

  • @JamesBradyGames
    @JamesBradyGames ปีที่แล้ว +58

    What a wonderful gift to the world. Amazing tutorial. Again. Thank you!

    • @AlexanderEgeler
      @AlexanderEgeler ปีที่แล้ว +2

      James! So funny to see your comment here :-) Hope all is well ...

    • @JamesBradyGames
      @JamesBradyGames ปีที่แล้ว +1

      @@AlexanderEgeler small world! 🙂

  • @clamr6122
    @clamr6122 หลายเดือนก่อน +1

    I've watched a lot of explanations of Transformers and this is easily the best. You are a gifted teacher.

  • @jasonrothfuss1631
    @jasonrothfuss1631 8 หลายเดือนก่อน

    This video deserves two thumbs up (or more)! I spent a lot of time watching and rewatching parts of this, coding the model "the hard way", and it was totally worth it. Thank you!

  • @AIlysAI
    @AIlysAI ปีที่แล้ว +3

    I dont usually put a comment for any video, but Andrei is simplifying these concepts so easily to understand, is just shows how great he grasps transformers and 100s of papers he summarized in one video, it comes from years of experience and a beautiful mind!

  • @aistamp
    @aistamp ปีที่แล้ว +4

    Welcome to TH-cam in 2023 where one of the top AI researchers is just casually making videos explaining in detail how to build some of the best ML models. Seriously though, these videos are amazing!

  • @abhisekpanigrahi-qx3dg
    @abhisekpanigrahi-qx3dg 15 วันที่ผ่านมา

    The explanation of such difficult concepts is so simple! You deserve a lot of attention to your channel.

  • @jcmorlando
    @jcmorlando ปีที่แล้ว +1

    Simply amazing, thank you Andrej! Hands down the best resource I've consumed to understand how a Transformer is built, and get understanding of how it technically relates to GPT and ChatGPT. I feel like I'm taking my first step into real cutting-edge ML :)

  • @DJ-lo8qj
    @DJ-lo8qj ปีที่แล้ว +4

    The students at Stanford who had Andrej as a professor are incredibly lucky; he’s an excellent teacher, breaking down complex topics with high precision and fluidity.

  • @GPTBot1123
    @GPTBot1123 ปีที่แล้ว +45

    Ive watched this 3 times and I only understand about 80% of it 😂--a testament to how great Adrej is at explaining these models. I'm not a programmer by trade, so a lot of this is totally foreign to me.

    • @TheNewton
      @TheNewton 23 วันที่ผ่านมา

      Yeah, there some good explanation in this video nd build up but some of it gets really dense really quickly it goes back to feeling like reading an inscrutable math research paper.

  • @travelwithoutmoving5422
    @travelwithoutmoving5422 ปีที่แล้ว +1

    Thanks so much for your time, your contribution is invaluable and the way you explain things in small steps and great detail is unique, so precious when dealing with such complex topics like neural networks, especially for non mother language english speakers like me. Can't wait for your next vids. Big hug.

  • @johnini
    @johnini ปีที่แล้ว +1

    Sir, you have all our respect!
    You are a legend!! and anyone that had the chance to share a beer or coffee with you is a really lucky person!!
    Great video, mega clear, and I hope to see more soon about fine tuning, and further steps of training in the future :)

  • @fooger
    @fooger ปีที่แล้ว +14

    As always fantastic video and sharing... Would be really cool if you would have a part II on this and how we could use PPO/RL to do the fine-tuning part of some basic interactive flow. doesn't have to be like ChatGPT (Q/A). Thank you so much Andrej for such amazing video
    !

  • @mlock1000
    @mlock1000 3 หลายเดือนก่อน +3

    I only just noticed that this is set up in a perfect 2 column layout so a person can have the script/notebook they are working on side by side with yours and not have to jump around at all. And it's clean and clutter free. That is some classy action, my deepest respect and gratitude.

    • @Milark
      @Milark 16 วันที่ผ่านมา

      now that's a level of detail I hadn't noticed

  • @user-vb2zw3gg2x
    @user-vb2zw3gg2x 8 หลายเดือนก่อน

    ty for making such a logically smooth tutorial! it helps to see why we use such structure. it's also cool that you explian almost everything that appears in the model even tho they might have been classic in the field. very nice job bravo

  • @alexandrechikhaoui659
    @alexandrechikhaoui659 ปีที่แล้ว +3

    Amazing content, was in the quest for this ! I'm really grateful for your time and qualifications. Thank you Sir !

  • @ArunKumar-iz8bi
    @ArunKumar-iz8bi ปีที่แล้ว +12

    Thanks a lot Andrej for making such good videos that explain core concepts of neural nets. It would be really helpful if you could make a tutorial/video on the entire workflow and the structured thought process you would follow to train a neural network end to end( to arrive at the final model to be used for production). I mean given a problem statement, how would you train a neural network to solve it , how do you design the experiments to choose the right set of hyperparameters and so on. A hands on tutorial video which would demonstrate this process would definitely help a lot of practitioners trying to use neural networks to solve interesting problems

  • @chrisw4562
    @chrisw4562 3 หลายเดือนก่อน

    Thank you so much Andrej for your generosity, spending your valuable time on these lectures. This is absolutely amazing.

  • @RemKim
    @RemKim ปีที่แล้ว +1

    I suggest watching this video multiple times in order to understand how transformers work. This is by far the best hands on explanation + example.

  • @8eck
    @8eck ปีที่แล้ว +5

    Reward model and reinforcement learning using that reward model would be super cool to learn. Thank you for the current lecture!

  • @starbuck5043
    @starbuck5043 ปีที่แล้ว +48

    We live in a time where we can get free lessons on hot topics from one of the best engineers in the business. This is amazing. Thanks, Andrej !

    • @MikeCairns1
      @MikeCairns1 ปีที่แล้ว +1

      Send the man some ☕

  • @gonzalocordova5934
    @gonzalocordova5934 ปีที่แล้ว +1

    Without a doubt the best video I've seen on transformers. Simply THANK YOU for your talent and humility teaching random people

  • @yigalkassel8456
    @yigalkassel8456 หลายเดือนก่อน

    It's probably the best TH-cam video I've ever seen.
    So down to earth explanations yet going really deep to the cutting edge tech of transformers.
    WOW
    Thank you for that!

  • @tamilselvan9942
    @tamilselvan9942 10 หลายเดือนก่อน +3

    This is "insane amount of knowledge packed in a video of 2 hours". Hats Off Man!!

  • @thedark3612
    @thedark3612 ปีที่แล้ว +3

    Please keep doing what you are doing! You are an absolute gem of an educator

  • @iantaggart3064
    @iantaggart3064 4 หลายเดือนก่อน +2

    The first ten minutes alone taught me more than a quick google search could. You're good at this.

  • @senatorpoopypants7182
    @senatorpoopypants7182 11 หลายเดือนก่อน

    by far the best video in deep learning application I've come across. For some one brand new in the space I'm shocked as to how much I'm following along with such advanced ideas. Thank you so much for putting this out there. It has been tremendously helpful.

  • @christianhetling3793
    @christianhetling3793 ปีที่แล้ว +4

    Hey Andrej i greatly appreciate you making these videos. Next semester i am taking the course Machine learning for nlp. I think these kinds of implementation videos are incredible for learning a subject deeply

  • @kelele4266
    @kelele4266 ปีที่แล้ว +2

    A follow-up video on the fine-tuning stage will be priceless indeed!! I've heard multiple NLP friends say that the key thing that enabled ChatGPT was the curated dataset internal to OpenAI. Super curious to hear what people think. I'd imagine that it was the dataset + fine-tuning (much more so than pre-training since it's a much smaller model vs. GPT-3; and most models use some kind of Transformer architecture).Thank you so much, Andrej!

  • @aruncjohn
    @aruncjohn 11 หลายเดือนก่อน

    Thank you, Andrej!...for sharing such an insightful video lecture on GPT architecture! Your clear explanations and in-depth analysis really helped me grasp the intricacies of this fascinating topic. I appreciate your expertise and the effort you put into making complex concepts accessible to the wider audience. Looking forward to more enlightening content from you!

  • @tranhuyhoang8610
    @tranhuyhoang8610 9 หลายเดือนก่อน

    Thanks Andrej for the great video. I like how you manage to explain those technical terms in ML using very simple language. It takes a huge amount of experience and knowledge to do so.

  • @mcnica89
    @mcnica89 ปีที่แล้ว +9

    Just finished watching this (at 2x speed). I love how hands on this is...every other tutorial I have seen always has a step where they say "its roughly like this...." but this one really shows you what is actually needed to make it work. Looking forward to trying this on some fun problems!

  • @1gogo76
    @1gogo76 11 หลายเดือนก่อน +4

    Andrej is pure genius wrapped in a humble person 🙌

  • @purpledragon9413
    @purpledragon9413 5 หลายเดือนก่อน

    Thank you for illustrating GPT and NN in such clear and easy-to-understand way! Also, thank you, community of Andrej's channel for the great positive energy!

  • @gennarofarina94
    @gennarofarina94 4 หลายเดือนก่อน +2

    This explanation is a masterpiece.
    You seem to have a lot of fun too by unveiling concepts (like cross attention) 👏

  • @JohnVanderbeck
    @JohnVanderbeck ปีที่แล้ว +5

    ChatGPT feels like more than just a large language model to me. It seems to , or at least projects, an understanding of concepts that I wouldn't expect a pure language model to have.

  • @linkin543210
    @linkin543210 ปีที่แล้ว +4

    andrej is single handedly putting the open in openai