Imagine being between your job at Tesla and your job at OpenAI, being a tad bored and, just for fun, dropping on TH-cam the best introduction to deep-learning and NLP from scratch so far, for free. Amazing people do amazing things even for a hobby.
he's probably bored at both of those jobs. once people get to high level director positions, they are far removed from the trenches of code. Lots of computer scientists have passion in actually writing and explaining code, not just managing things.
@@aaronhpa , it's all about incentives. Why would you ever do anything if you could get anything without effort? In a fictional utopia, socialism might be viable, but human beings don't work like that. For example, this platform came about because of capitalism. I think achieving a balance between the two would be the best. Like building, this platform came from capitalism content here is socialism, maybe something like that.
Living in a world where a world-class top guy posts a 2-hour video for free on how to make such cutting-edge stuff. I barely started this tutorial but at first I just wanted to say thank you mate!
"Cutting edge"? The only cutting will be your job. Think before getting your panties all wet. The only people excited for this crap are investors, employers and failed programmers looking for some sort of edge.
Thank you for taking the time to create these lectures. I am sure it takes a lot of time and effort to record and cut these. Your effort to level up the the community is greatly appreciated. Thanks Andrej.
I am a college professor and learning GPT from Andrej. Every time I watch this video, I not only I learn the contents, also how to deliver any topic effectively. I would vote him as the "Best AI teacher in TH-cam”. Salute to Andrej for his outstanding lectures.
Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!
This lecture answers ALL my questions from the 2017 Attention Is All You Need paper. I am alway curious about the code behind Transformer. This lecture quenched my curiosity with a colab to tinker with. Thank you so much for your effort and time in creating the lecture to spread the knowledge!
What a feeling ! Just finished sitting on this for the weekend, building along and finally understanding Transformers. More than anything, a sense of fulfilment. Thanks Andrej.
Andrej, I cannot comprehend how much effort you have put in making these videos. Humanity is thankful to you for making these publically available and educating us with your wisdom. One thing is to know the stuff and apply it in corp setting and another thing is to use that instead to educate millions for free. This is one of the best kind of charity a CS major can do. Kudos to you and thank you so much for doing this.
@@vicyt007 People who has expertise in an area aren't always good teachers. Being able to show others how it works in an organized, easy-to-understand manner is very tricky. On the surface it looks easy, but if you try doing a video like this yourself, chances are you'll find it much harder than you think.
@@hpmv I Know it was not an easy task but at least he knows what he is saying, it’s just a matter of explaining concepts. He was a teacher for a long time, then it’s his job, that he is doing for free here ! But in my opinion, this video did not target people with 0 knowledge in maths / ML / IA / Python, because in this case you must admit that it is quite hard to understand. But it was watched by nearly 2M people. Those people are not skilled correctly to understand. Briefly, I think that this video targeted skilled people but was watched by anybody. Why not ?
I knew only python, math and definitions of NN, GA, ML and DNN. In 2 hours, this lecture has not only given me the understanding of GPT model, but also taught me how to read AI papers and turn them into code, how to use pytoch, and tons of AI definitions. This is the best lecture and practical application on AI. Because it not only gives you an idea of DNN, but also give you code directly from research papers and a final product. Looking forward to more lectures like these. Thanks Andrej Karpathy.
Most clear and intuitive and well explained transformer video I've ever seen. Watched it as if it were a tv show and that's how down-to-earth this video is. Shoutout to the man of legend.
This is AMAZING! You're an absolute legend for sharing your knowledge so freely like this Andrej! I'm finally getting some time to get into transformer architectures this is a brilliant deep dive, going to spend the weekend walking through it!! Thank you🙏🏽
@@kyriakospelekanos6355 I think it is not only how ChatGPT work. But it s a code hoe can do LIKE ChatGPT. That's why I m surprise !!! Thank you anyway.
Wow! I knew nothing and now I am enlightened! I actually understand how this AI/ML model works now. As a near 70 year old that just started playing with Python, I am a living example of how effective this lecture is. My humble thanks to Andrej Karpathy for allowing to see into and understand this emerging new world.
I cannot thank you enough for this material. I've been a spoken language technologist for 20 years and this plus your micro-grad and make more videos has given me a graduate level update in less than 10 hours. Astonishingly well-prepared and presented material. Thank you.
thanks again for the great lecture. I am able to follow line by line and train it on labmda lab with light effort. Hope to buy you a coffee for all this hard work. Off to the next 4hr GPT2 repro 🧠🏋
I was always scared of Transformer's diagram. Honestly, I never understood how such schema could make sense until this day when Andrej enlightened us with his super teaching power. Thank you so much! Andrej, please save the day again by doing one more class about Stable Diffusion!! Please, you are the best!
Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!
This is simply fantastic. I think it would be beneficial for people learning to see the actual process of training, the graphs in W&B and how they can try to train something like this.
makes sense, potentially the next video, this one was already getting into 2 hours so I wrapped things up, would rather not go too much over movie length.
@@AndrejKarpathy Please don't bother to be over movie length, I enjoyed every minute of the video. It's the first time I attended a in depth class of what's under the hood of a model.
Andrej, I know there is probably a million other things you could be working on or efforts you could put your mind towards, but seriously thank you for these videos, they are important, they matter, and are providing many of us with a foundation of which to learn, build. and understand A.I. from and how to develop these models further. Thank you again and please keep doing these
Seriously, Andrej is just so very kind in his way of explaining things. His shakespeare LSTM article way back ("The Unreasonable Effectiveness of Recurrent Neural Networks") was what got me seriously into ML in the first place. And while i've since (professionally) moved to different development work unrelated to ML/AI, this is the exact kind of thing that hooks me back in. Andrej knows people watching this are not idiots and doesn't treat them as such, but at the same time fully understands how opaque even basic AI concepts can be if all you ever really interact with is pre-trained models. There's tons of value in explaining this stuff in such a practical way.
All other youtube videos: There is this amazing thing called ChatGPT Andrej: Hold my beer 🍺 Seriously - we really appreciate your time and effort to create this Andrej. This will do a lot of good for humanity - by making the core concepts accessible to mere mortals.
"Andrej , your willingness to share your knowledge and insights on TH-cam is truly inspiring. Your passion for teaching and helping others understand complex concepts is evident in your videos, and it's clear that you have a drive to make a positive impact in the field of AI. Keep up the amazing work, and thank you for making this knowledge accessible to all!" ps this comment was generated using GPT
I think this style of teaching is much better than a lecture with powerpoint and whiteboard. This way you can actually see what the code is doing instead of guessing what all the math symbols mean. So thank you very much for this video!
I did something like this in 1993. I took a ling text and calculated the probability of one word (i worked with words, not tokens) being after another by parsing the full text. And I successfully created a single layer perceptron parrot which can spew almost meaningful sentences. My professors told me I should not pursue the neural network path because it's practically abandoned. I never trusted them. I'm glad to see neural networks' glorious comeback. Thank you Andrej Karpathy for what you have done for our industry and humanity by popularizing this.
I only just noticed that this is set up in a perfect 2 column layout so a person can have the script/notebook they are working on side by side with yours and not have to jump around at all. And it's clean and clutter free. That is some classy action, my deepest respect and gratitude.
These videos are awesome. It has been 3 years that I am doing DL research but the way you explain things is so pleasing that I sit through the whole 2 hours. Kudos to you Andrej.
Day 2 of implementing this down, about one more evening to go I think. Thanks so much for this! I spent so long down the rabbit hole of CNNs that its really refreshing to try a completely different type of model. No way I could have done it without a lecture of this quality! Legend
- 00:00 🤖 ChatGPT is a system that allows interaction with an AI for text-based tasks. - 02:18 🧠 The Transformer neural network from the "Attention is All You Need" paper is the basis for ChatGPT. - 05:46 📊 NanoGPT is a repository for training Transformers on text data. - 07:23 🏗 Building a Transformer-based language model with NanoGPT starts with character-level training on a dataset. - 10:11 💡 Tokenizing involves converting raw text to sequences of integers, with different methods like character-level or subword tokenizers. - 13:36 📏 Training a Transformer involves working with chunks of data, not the entire dataset, to predict sequences. - 18:43 ⏩ Transformers process multiple text chunks independently as batches for efficiency in training. - 22:59 🧠 Explaining the creation of a token embedding table. - 24:09 🎯 Predicting the next character based on individual token identity. - 25:19 💡 Using negative log likelihood loss (cross entropy) to measure prediction quality. - 26:44 🔄 Reshaping logits for appropriate input to cross entropy function. - 28:22 💻 Training the model using the optimizer Adam with a larger batch size. - 31:21 🏗 Generating tokens from the model by sampling via softmax probabilities. - 34:38 🛠 Training loop includes evaluation of loss and parameter updates. - 41:23 📉 Using `torch.no_grad()` for efficient memory usage during evaluation. - 45:59 🧮 Tokens are averaged out to create a one-dimensional vector for efficient processing - 47:22 🔢 Matrix multiplication can efficiently perform aggregations instead of averages - 50:27 🔀 Manipulating elements in a multiplying matrix allows for incremental averaging based on 'ones' and 'zeros' - 54:51 🔄 Introduction of softmax helps in setting interaction strengths and affinities between tokens - 58:27 🧠 Weighted aggregation of past elements using matrix multiplication aids in self-attention block development - 01:02:07 🔂 Self-attention involves emitting query and key vectors to determine token affinities and weighted aggregations - 01:05:13 🎭 Implementing a single head of self-attention involves computing queries and keys and performing dot products for weighted aggregations. - 01:10:10 🧠 Self-attention mechanism aggregates information using key, query, and value vectors. - 01:11:46 🛠 Attention is a communication mechanism between nodes in a directed graph. - 01:12:56 🔍 Attention operates over a set of vectors without positional information, requiring external encoding. - 01:13:53 💬 Attention mechanisms facilitate data-dependent weighted sum aggregation. - 01:15:46 🤝 Self-attention involves keys, queries, and values from the same source, while cross-attention brings in external sources. - 01:17:50 🧮 Scaling the attention values is crucial for network optimization by controlling variance. - 01:21:27 💡 Implementing multi-head attention involves running self-attention in parallel and concatenating results for improved communication channels. - 01:26:36 ⚙ Integrating communication and computation in Transformer blocks enhances network performance. - 01:28:29 🔄 Residual connections aid in optimizing deep networks by facilitating gradient flow and easier training. - 01:32:16 🧠 Adjusting Channel sizes in the feed forward network can affect validation loss and lead to potential overfitting. - 01:32:58 🔧 Layer Norm in deep neural networks helps optimize performance, similar to batch normalization but normalizes rows instead of columns. - 01:35:19 📐 Implementing Layer Norm in a Transformer involves reshuffling layer norms in pre-norm formulation for better results. - 01:37:12 📈 Scaling up a neural network model by adjusting hyperparameters like batch size, block size, and learning rate can greatly improve validation loss. - 01:39:30 🔒 Using Dropout as a regularization technique helps prevent overfitting when scaling up models significantly. - 01:51:21 🌐 ChatGPT undergoes pre-training on internet data followed by fine-tuning to become a question-answering assistant by aligning model responses with human preferences.
Broke my back just to finish this video in single sitting. Its a lot to take at once, i think I'll have to implement it bit by bit in a span of day to actually assimilate everything. I am very happy from the lecture/tutorial, waiting for more. Time and effort in making this video possible is highly admirable and respectable. Thank you Andrej.
I'm enjoying this whole series so much Andrej. They make me understand neural networks much better then anything so far in my Bachelor. As an older student that has a large incentive to be time efficient, this has been a gold send. Thank you so much!! :D
This was incredible Andrej! Really appreciate how you intersperse teaching a concept with coding and building step-by-step. This is the first of your videos that I have watched and I can't wait to watch all the others.
This is fantastic. I am amazed that Andrej takes so much of his time to impart this incredibly valuable knowledge for free to all and sundry. He is not only a top researcher but also a fantastic communicator. We have gotten used to big corporations hoarding knowledge and talent to become exploitative monopolies but every so often, humanity puts forth a gem like Mr. Karpathy to keep us all from going head first into the gutter. Thank you!!!
Just gone through all of his videos - MLP, Gradients and of course the backprop :), and finally finishing with the transformer model (decoder part). As we all know Andrej is the hero of deep learning and we are very much blessed to get this much of rich contents for free in TH-cam, also from a teacher like him. Fascinating staff from a fascinating contributor in the field of AI 🙏
Truly phenomenal to live in an age where we can learn all this for free from experts like you. Thank you so much Andrej for your contribution. What a gift you have given.
Thanks so much for making this! I could grasp about 80% of everything with my programming/little bit of university-level machine learning background, but it does not feel like magic anymore. This format of hands-on coding along with the thought process behind it is way better than reading a paper and trying to piece things together.
I have read this paper and its variants so many times over and yet this is BY FAR the best most comprehensive tutorial on it I have ever experienced. I applaud Andrej for really nailing all of the different components in a very structured way such that it doesn't overwhelm like it did/does for most people who pound their head at it. I will be recommending this video to anyone and everyone - not just practitioners of NLP, ML, or data science.
Amazing, watching these videos I can still believe in human kind, seeing a guy like Andrej sharing his knowledge and his time with the rest of the world is something that we do not see every day. Thanks for posting it!
How to watch this video: Watch the video once. Watch it again. Watch it a few more times. Then watch 1:01:00 - 1:11:00 about 20 times, melting your brain trying to keep track of tensor dimensions. This is a *dense* video - amazing how much detail is packed into 2 hours... thanks for this Andrej!
I have some experience in understanding the maths behind all this stuff but I kind of had problems with advancing to creating and training models, these videos are a godsend. Big thanks
So happy to see Andrej back teaching more. His articles before Tesla were so illuminating and distilled complicated concepts into things we could all learn from. A true art. Amazing to see videos too.
Thank you so much for creating such valuable content. A few years ago, I watched your 2016 Stanford computer vision course, which was instrumental in helping me understand backpropagation and other important neural network concepts. Andrew Ng's courses initially led me into the world of machine learning, but I find your videos to be equally educational, focused on fundamental concepts, and presented in a very accessible way. I've also been following your blog and was thrilled to learn about your new TH-cam channel. Your dedication to creating these resources is truly appreciated. Growing up in rural China, I didn't have many opportunities to learn outside of textbooks. But now, thanks to people like you, I find myself swimming in a sea of knowledge. Thank you for making such a significant impact on my learning journey. BTW, I edited this with chatGPT to make me sounds more like a native speaker. :)
Similar experience here . I too watched Stanfords computer vision and Nlp and a few other courses a while back. I also did lectures of linear algebra,calc, probability stats etc from mit ocw to have a strong grasp of the fundamentals . Without TH-cam it wouldn't be possible for me to have access to such high quality education
Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!
Thank you for taking the time and effort to share this, Andrej! This is of great help to lift the veil of abstractions that made it all seem inaccessible and opening up that world to ML/AI uninitiated like me. I don’t understand all of it yet but I’m now oriented and you’ve given me a lot of threads I can pull on.
From CS231n and RL Pong to this… there is something special about the way you beak down and explain things. I have benefited immensely from it and I’m obviously not the only one. Thank You!
Andrej, Thank you so much for sharing your knowledge and expertise. I've been following your video series and it has been truly amazing. I remember you were saying in one of the interviews that to prepare 1hour video, it takes more than 10hrs. I cannot thank you enough for what you are doing!
Welcome to TH-cam in 2023 where one of the top AI researchers is just casually making videos explaining in detail how to build some of the best ML models. Seriously though, these videos are amazing!
Dude, thank you so much for this. It was a seriously awesome dive into the implementation with great explanations along the way. I've read/watched a lot of ML content and this has got to be one of the clearest lectures I've come across - even better than the usual famous online uni lectures. Thank you! (And I'll be rewatching it too! :)
Ive watched this 3 times and I only understand about 80% of it 😂--a testament to how great Adrej is at explaining these models. I'm not a programmer by trade, so a lot of this is totally foreign to me.
Yeah, there some good explanation in this video nd build up but some of it gets really dense really quickly it goes back to feeling like reading an inscrutable math research paper.
I am speechless - your talent for communicating complex ideas in detail in an understandable way, and re-explaining them with metaphors, is incredible. I feel like I'm genuinely learning more watching these videos and coding along than I ever had in uni. Thank you so much for creating this content!
Thank you Andrej for this wonderful session. I a tech enthusiast and wanted to understand how GPT works and came across your video. I have always found the research papers difficult to comprehend and never understood how they actually get implemented. Your video completely changed that. You are such a good teacher and make things so easy to understand. Your fan club just got a new member!! :)
Absolutely amazing lecture. Thank you so much Andrej! I finally understand Attention and Transformers. "Code is the ultimate truth". And the way to set the stage and explain the concepts and the code is brilliant.
The students at Stanford who had Andrej as a professor are incredibly lucky; he’s an excellent teacher, breaking down complex topics with high precision and fluidity.
Thank you Andrej! I can't imagine the amount of time and effort it took to put this 2 hour video together! Very very educational in breaking down how GPT is constructed. Would love to see a follow-up on tuning the model to answer questions on small scale!
This is truly a step-by-step tutorial for building a Transformer system. So impressive by the way you teach! Very clear and very easy to follow. You are a highly talented educator!
I built this same thing alongside watching the lecture, and loved it! I'm trying to get better at understanding and coding these concepts, and this was extremely helpful. Thank you so much :)
We are grateful that talented people like you believe in teaching and helping! This is an amazing video. Clear, precise, brings out a tough topic to a layperson! So much to learn on how to make technical videos.
I gonna say this is the best Transformer Tutorial in the world, not one of. Easy and straightforward to understand with detailed step by step explanations even for people with very limited contexts of ML. This video is such a treasury
Just finished watching this (at 2x speed). I love how hands on this is...every other tutorial I have seen always has a step where they say "its roughly like this...." but this one really shows you what is actually needed to make it work. Looking forward to trying this on some fun problems!
Hey Andrej i greatly appreciate you making these videos. Next semester i am taking the course Machine learning for nlp. I think these kinds of implementation videos are incredible for learning a subject deeply
I found an intuitive explanation for Query/Key/Value in Batool Haider's video which said that Q x K.T / |Q||K| is basically computing the cosine similarity between Q and K which is higher if the vectors are pointing in the same or similar direction which is what yields the "affinity". And this Q, K.T product becomes a mask to the V to see what V we should focus on which is why Q x K.T x V yielding high values for correct predictions becomes the target for the neural network and it learns to do just that. And because it pushes the vectors (indirectly) for C towards similarity, strongly connected items land up "closer" together in the embed space. If this intuition is incorrect, I'm happy to hear how so I can learn.
Thanks for posting this lesson so freely on the internet, Andrej! Man, all this AI educational content on TH-cam recently makes me want to get back into doing AI experiments
As always fantastic video and sharing... Would be really cool if you would have a part II on this and how we could use PPO/RL to do the fine-tuning part of some basic interactive flow. doesn't have to be like ChatGPT (Q/A). Thank you so much Andrej for such amazing video !
Nothing new in this comment. Just want to say 'thank you!' for this amazing tutorial, ...and all the others! The completeness, the information density and pace, the choice of examples and language.... Everything is *just right* , delivered right from the heart and the mind!! Thank you so much Andrej, for taking your time to educate and inspire all of us.
Thanks a lot Andrej for making such good videos that explain core concepts of neural nets. It would be really helpful if you could make a tutorial/video on the entire workflow and the structured thought process you would follow to train a neural network end to end( to arrive at the final model to be used for production). I mean given a problem statement, how would you train a neural network to solve it , how do you design the experiments to choose the right set of hyperparameters and so on. A hands on tutorial video which would demonstrate this process would definitely help a lot of practitioners trying to use neural networks to solve interesting problems
To my great surprise I understood most of this at at least a conceptual level. [Probably helps that I watched Stanford EE263 and MIT Gilbert Strang Linear Algebra videos already 🙂] Thanks very much for this, Andrej!
This has been super informative (or transformative?), thank you! I think I will need to re-watch many parts to completely understand, but it's great that it's all here.
Your tutorial finally made me understand what self-attention is. Amazing tutorial and thank you for making these videos! Just as a suggestion, using C as the channel dimension can be confusing to follow when cross-referencing the pytorch documentation on the cross entropy function. There, C is used as the # of classes. As I was reimplementing your bigram model on my custom dataset with a vocab size larger than the channel size, I ran into IndexErrors. It's worth re-emphasizing that the last dimension expected by the cross entropy function is not the channel size but the # of classes we're trying to predict i.e. the vocab size.
1:41:03 Just for reference. This training took 3 hours, 5 minutes on an 2020 M1 Macbook Air. You can use the "mps" device instead of cuda or cpu. Is not great, but is not that bad y you are just trying stuff out. Thank you for your great videos!
Amazing work. I was struggling to understand transformers at both theoretical and practical level but thanks to the brilliant lecture, I struggle no more.
I started this gem in the morning which is one of the best things I did today. Huge respect to you Andrej as there are very few people like you who provide such a valuable content to the people all around the world for free. Thank you very much
ChatGPT feels like more than just a large language model to me. It seems to , or at least projects, an understanding of concepts that I wouldn't expect a pure language model to have.
Wow, this video and everything it covered are just amazing! There are no other words except, thank you, Andrej, for all the efforts it took to make this! Really look forward to more of your great ideas and contents!
Consider pairing this lecture with Andrej's more recent lecture from Stanford's CS25 course on transformers. For those of you who got to this video through the makemore series, this lecture fills in some gaps in some ways when jumping from makemore to transformers, covering the what and why of the attention and transformer architecture: th-cam.com/video/XfpMkf4rD6E/w-d-xo.htmlsi=RuOEaN-VBGCI96pm. Thank you Andrej for an incredible series of videos! As a fellow computer scientist (with limited exposure to AI), this series has brought me back to grad school/undregrad re-discovering the passion and joy of learning, thinking about concepts in off time, and getting hands on with the exercises. I've been recommending these to my friends and team.
Andrej you are a blessing for all people who want to learn more about AI. Hands down the best explanations, clear instructions and general genuinity. Hope I can thank you in person one day!
MIND = BLOWN! Not only is this incredible content, but the way everything was presented, coded, and explained is so crystal clear, my mind felt comfortable with the complexity. Amazing tutorial, and incredibly inspiring, thanks so much!
Andrej - some constructive criticism, I think the lectures would benefit a lot from a higher quality microphone pointed to your mouth. The clicking and typing sounds seem a bit unprofessional and can be distracting to the listener. Again just constructive criticism, thanks for taking the time to share your knowledge and insights with the world
Imagine being between your job at Tesla and your job at OpenAI, being a tad bored and, just for fun, dropping on TH-cam the best introduction to deep-learning and NLP from scratch so far, for free. Amazing people do amazing things even for a hobby.
he's probably bored at both of those jobs. once people get to high level director positions, they are far removed from the trenches of code. Lots of computer scientists have passion in actually writing and explaining code, not just managing things.
and yet people still say socialism isn't viable when most of the great stuff in the internet was done for free/ without expectation of compensation
@@aaronhpa free market developed the skills but sure man
@@shyvanatop4777 did it? I think hard work and dedication by all this people did and not the ability of selling it.
@@aaronhpa , it's all about incentives. Why would you ever do anything if you could get anything without effort? In a fictional utopia, socialism might be viable, but human beings don't work like that. For example, this platform came about because of capitalism. I think achieving a balance between the two would be the best. Like building, this platform came from capitalism content here is socialism, maybe something like that.
Living in a world where a world-class top guy posts a 2-hour video for free on how to make such cutting-edge stuff. I barely started this tutorial but at first I just wanted to say thank you mate!
Wait. It’s him! I didn’t understand at first. Thought it was random IT TH-camr
How did it go?
hey jake, what should i do before learn programming, is all basic language is same or different. should i learn only python?
Totally agree!!
"Cutting edge"? The only cutting will be your job. Think before getting your panties all wet. The only people excited for this crap are investors, employers and failed programmers looking for some sort of edge.
Thank you for taking the time to create these lectures. I am sure it takes a lot of time and effort to record and cut these. Your effort to level up the the community is greatly appreciated. Thanks Andrej.
Emphasis on appreciation.
ditto 🙂
Thank you
🙏 You're 🙏 a 🙏 mensch 🙏 Andrej 🙏💪
Thank you! for real. You are an awesome person Andrej.
I am a college professor and learning GPT from Andrej. Every time I watch this video, I not only I learn the contents, also how to deliver any topic effectively. I would vote him as the "Best AI teacher in TH-cam”. Salute to Andrej for his outstanding lectures.
which university?
Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!
please don't
@@bohaningDamn youtube integrated your SaaS into youtube natively
This lecture answers ALL my questions from the 2017 Attention Is All You Need paper. I am alway curious about the code behind Transformer. This lecture quenched my curiosity with a colab to tinker with. Thank you so much for your effort and time in creating the lecture to spread the knowledge!
It is difficult to comprehend how lucky we are to have you teaching us. Thank you, Andrej.
What a feeling ! Just finished sitting on this for the weekend, building along and finally understanding Transformers. More than anything, a sense of fulfilment. Thanks Andrej.
Andrej, I cannot comprehend how much effort you have put in making these videos. Humanity is thankful to you for making these publically available and educating us with your wisdom. One thing is to know the stuff and apply it in corp setting and another thing is to use that instead to educate millions for free. This is one of the best kind of charity a CS major can do. Kudos to you and thank you so much for doing this.
Making this video is super simple for a specialist like him. It’s like creating a Hello World program for a computer scientist.
@@vicyt007 I beg to differ. I am from the area and I can imagine how much time he must have spent offline to come up with the right abstraction.
@@JainPuneet I agree that it took him some time to make this video, but I don’t believe it was a tough task.
@@vicyt007 People who has expertise in an area aren't always good teachers. Being able to show others how it works in an organized, easy-to-understand manner is very tricky. On the surface it looks easy, but if you try doing a video like this yourself, chances are you'll find it much harder than you think.
@@hpmv I Know it was not an easy task but at least he knows what he is saying, it’s just a matter of explaining concepts. He was a teacher for a long time, then it’s his job, that he is doing for free here !
But in my opinion, this video did not target people with 0 knowledge in maths / ML / IA / Python, because in this case you must admit that it is quite hard to understand. But it was watched by nearly 2M people. Those people are not skilled correctly to understand. Briefly, I think that this video targeted skilled people but was watched by anybody. Why not ?
I knew only python, math and definitions of NN, GA, ML and DNN. In 2 hours, this lecture has not only given me the understanding of GPT model, but also taught me how to read AI papers and turn them into code, how to use pytoch, and tons of AI definitions. This is the best lecture and practical application on AI. Because it not only gives you an idea of DNN, but also give you code directly from research papers and a final product. Looking forward to more lectures like these. Thanks Andrej Karpathy.
Most clear and intuitive and well explained transformer video I've ever seen. Watched it as if it were a tv show and that's how down-to-earth this video is. Shoutout to the man of legend.
This is AMAZING! You're an absolute legend for sharing your knowledge so freely like this Andrej! I'm finally getting some time to get into transformer architectures this is a brilliant deep dive, going to spend the weekend walking through it!! Thank you🙏🏽
Waiting for your take on this too!
Hi Nicholas , I dont understand all this code . I just have one question is it working ?? And is it like ChatGPT ? Thnx Bro.
@@eliotharreau7627 This is a demonstration of HOW chatgpt works
@@kyriakospelekanos6355 I think it is not only how ChatGPT work. But it s a code hoe can do LIKE ChatGPT. That's why I m surprise !!! Thank you anyway.
bro can't wait for your video on this!
Wow! I knew nothing and now I am enlightened! I actually understand how this AI/ML model works now. As a near 70 year old that just started playing with Python, I am a living example of how effective this lecture is. My humble thanks to Andrej Karpathy for allowing to see into and understand this emerging new world.
Good for you youngster. 75 and will be doing this kind of thing till I drop ... Still run my technology company and doing contract work. Cheers.
what makes u learn these at age of 70?
@@mrcharm767 Want to analyze more stocks , the way I would, in a shorter time. ;)
@@mrcharm767 The sky is the limit.....!
I’m always excited to learn new things, hope I’m still learning at 70!
Wow! Having the ex-lead of ML at Tesla make tutorials on ML is amazing. Thank you for producing these resources!
I know, I couldn't believe it.
Can you believe it? God bless this man and I'm not even religious!
@@VultureGamerPL cringe
@@cane870 Cope.
@@VultureGamerPL No only ex-lead of ML at Tesla. He is also cofounder of OpenAi
I cannot thank you enough for this material. I've been a spoken language technologist for 20 years and this plus your micro-grad and make more videos has given me a graduate level update in less than 10 hours. Astonishingly well-prepared and presented material. Thank you.
thanks again for the great lecture. I am able to follow line by line and train it on labmda lab with light effort. Hope to buy you a coffee for all this hard work. Off to the next 4hr GPT2 repro 🧠🏋
I was always scared of Transformer's diagram. Honestly, I never understood how such schema could make sense until this day when Andrej enlightened us with his super teaching power. Thank you so much! Andrej, please save the day again by doing one more class about Stable Diffusion!! Please, you are the best!
Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!
This is simply fantastic. I think it would be beneficial for people learning to see the actual process of training, the graphs in W&B and how they can try to train something like this.
makes sense, potentially the next video, this one was already getting into 2 hours so I wrapped things up, would rather not go too much over movie length.
@@AndrejKarpathy Please don't bother to be over movie length, I enjoyed every minute of the video. It's the first time I attended a in depth class of what's under the hood of a model.
@@AndrejKarpathy I think people would watch these videos even if they were 10 hours long, so don't worry about making them too long :)
@@AndrejKarpathy don't listen to these sycophants. Size matters.
Andrej, I know there is probably a million other things you could be working on or efforts you could put your mind towards, but seriously thank you for these videos, they are important, they matter, and are providing many of us with a foundation of which to learn, build. and understand A.I. from and how to develop these models further. Thank you again and please keep doing these
Seriously, Andrej is just so very kind in his way of explaining things. His shakespeare LSTM article way back ("The Unreasonable Effectiveness of Recurrent Neural Networks") was what got me seriously into ML in the first place. And while i've since (professionally) moved to different development work unrelated to ML/AI, this is the exact kind of thing that hooks me back in. Andrej knows people watching this are not idiots and doesn't treat them as such, but at the same time fully understands how opaque even basic AI concepts can be if all you ever really interact with is pre-trained models. There's tons of value in explaining this stuff in such a practical way.
Thanks for this well explained and wonderful series! Hope you will cover qunatization for people with low power GPU.
I suggest watching this video multiple times in order to understand how transformers work. This is by far the best hands on explanation + example.
All other youtube videos: There is this amazing thing called ChatGPT
Andrej: Hold my beer 🍺
Seriously - we really appreciate your time and effort to create this Andrej. This will do a lot of good for humanity - by making the core concepts accessible to mere mortals.
u can do it more easily using lstm
@@syedshoaibshafi4027 do you really saying that out and loud. dude is still living in 2010 🤣
lit😅
Mere mortals with at least basic programming and python knowledge, but yes.
🍺
"Andrej , your willingness to share your knowledge and insights on TH-cam is truly inspiring. Your passion for teaching and helping others understand complex concepts is evident in your videos, and it's clear that you have a drive to make a positive impact in the field of AI. Keep up the amazing work, and thank you for making this knowledge accessible to all!" ps this comment was generated using GPT
I think this style of teaching is much better than a lecture with powerpoint and whiteboard. This way you can actually see what the code is doing instead of guessing what all the math symbols mean. So thank you very much for this video!
By 2030 will be the dominant method of learning..... Varsity more efficient..... Any University failing to embrace this method will crumble
I did something like this in 1993. I took a ling text and calculated the probability of one word (i worked with words, not tokens) being after another by parsing the full text.
And I successfully created a single layer perceptron parrot which can spew almost meaningful sentences.
My professors told me I should not pursue the neural network path because it's practically abandoned. I never trusted them. I'm glad to see neural networks' glorious comeback.
Thank you Andrej Karpathy for what you have done for our industry and humanity by popularizing this.
I only just noticed that this is set up in a perfect 2 column layout so a person can have the script/notebook they are working on side by side with yours and not have to jump around at all. And it's clean and clutter free. That is some classy action, my deepest respect and gratitude.
now that's a level of detail I hadn't noticed
what is that? I didnt get it
These videos are awesome. It has been 3 years that I am doing DL research but the way you explain things is so pleasing that I sit through the whole 2 hours. Kudos to you Andrej.
Can't be more grateful. We're literally living in the best of times because of you! Thank you so much
Day 2 of implementing this down, about one more evening to go I think. Thanks so much for this! I spent so long down the rabbit hole of CNNs that its really refreshing to try a completely different type of model. No way I could have done it without a lecture of this quality! Legend
- 00:00 🤖 ChatGPT is a system that allows interaction with an AI for text-based tasks.
- 02:18 🧠 The Transformer neural network from the "Attention is All You Need" paper is the basis for ChatGPT.
- 05:46 📊 NanoGPT is a repository for training Transformers on text data.
- 07:23 🏗 Building a Transformer-based language model with NanoGPT starts with character-level training on a dataset.
- 10:11 💡 Tokenizing involves converting raw text to sequences of integers, with different methods like character-level or subword tokenizers.
- 13:36 📏 Training a Transformer involves working with chunks of data, not the entire dataset, to predict sequences.
- 18:43 ⏩ Transformers process multiple text chunks independently as batches for efficiency in training.
- 22:59 🧠 Explaining the creation of a token embedding table.
- 24:09 🎯 Predicting the next character based on individual token identity.
- 25:19 💡 Using negative log likelihood loss (cross entropy) to measure prediction quality.
- 26:44 🔄 Reshaping logits for appropriate input to cross entropy function.
- 28:22 💻 Training the model using the optimizer Adam with a larger batch size.
- 31:21 🏗 Generating tokens from the model by sampling via softmax probabilities.
- 34:38 🛠 Training loop includes evaluation of loss and parameter updates.
- 41:23 📉 Using `torch.no_grad()` for efficient memory usage during evaluation.
- 45:59 🧮 Tokens are averaged out to create a one-dimensional vector for efficient processing
- 47:22 🔢 Matrix multiplication can efficiently perform aggregations instead of averages
- 50:27 🔀 Manipulating elements in a multiplying matrix allows for incremental averaging based on 'ones' and 'zeros'
- 54:51 🔄 Introduction of softmax helps in setting interaction strengths and affinities between tokens
- 58:27 🧠 Weighted aggregation of past elements using matrix multiplication aids in self-attention block development
- 01:02:07 🔂 Self-attention involves emitting query and key vectors to determine token affinities and weighted aggregations
- 01:05:13 🎭 Implementing a single head of self-attention involves computing queries and keys and performing dot products for weighted aggregations.
- 01:10:10 🧠 Self-attention mechanism aggregates information using key, query, and value vectors.
- 01:11:46 🛠 Attention is a communication mechanism between nodes in a directed graph.
- 01:12:56 🔍 Attention operates over a set of vectors without positional information, requiring external encoding.
- 01:13:53 💬 Attention mechanisms facilitate data-dependent weighted sum aggregation.
- 01:15:46 🤝 Self-attention involves keys, queries, and values from the same source, while cross-attention brings in external sources.
- 01:17:50 🧮 Scaling the attention values is crucial for network optimization by controlling variance.
- 01:21:27 💡 Implementing multi-head attention involves running self-attention in parallel and concatenating results for improved communication channels.
- 01:26:36 ⚙ Integrating communication and computation in Transformer blocks enhances network performance.
- 01:28:29 🔄 Residual connections aid in optimizing deep networks by facilitating gradient flow and easier training.
- 01:32:16 🧠 Adjusting Channel sizes in the feed forward network can affect validation loss and lead to potential overfitting.
- 01:32:58 🔧 Layer Norm in deep neural networks helps optimize performance, similar to batch normalization but normalizes rows instead of columns.
- 01:35:19 📐 Implementing Layer Norm in a Transformer involves reshuffling layer norms in pre-norm formulation for better results.
- 01:37:12 📈 Scaling up a neural network model by adjusting hyperparameters like batch size, block size, and learning rate can greatly improve validation loss.
- 01:39:30 🔒 Using Dropout as a regularization technique helps prevent overfitting when scaling up models significantly.
- 01:51:21 🌐 ChatGPT undergoes pre-training on internet data followed by fine-tuning to become a question-answering assistant by aligning model responses with human preferences.
The hero we need, but do not deserve. Thank you.
Broke my back just to finish this video in single sitting. Its a lot to take at once, i think I'll have to implement it bit by bit in a span of day to actually assimilate everything.
I am very happy from the lecture/tutorial, waiting for more. Time and effort in making this video possible is highly admirable and respectable.
Thank you Andrej.
I'm enjoying this whole series so much Andrej. They make me understand neural networks much better then anything so far in my Bachelor. As an older student that has a large incentive to be time efficient, this has been a gold send. Thank you so much!! :D
This was incredible Andrej! Really appreciate how you intersperse teaching a concept with coding and building step-by-step. This is the first of your videos that I have watched and I can't wait to watch all the others.
This is fantastic. I am amazed that Andrej takes so much of his time to impart this incredibly valuable knowledge for free to all and sundry. He is not only a top researcher but also a fantastic communicator. We have gotten used to big corporations hoarding knowledge and talent to become exploitative monopolies but every so often, humanity puts forth a gem like Mr. Karpathy to keep us all from going head first into the gutter. Thank you!!!
Just gone through all of his videos - MLP, Gradients and of course the backprop :), and finally finishing with the transformer model (decoder part). As we all know Andrej is the hero of deep learning and we are very much blessed to get this much of rich contents for free in TH-cam, also from a teacher like him. Fascinating staff from a fascinating contributor in the field of AI 🙏
The first ten minutes alone taught me more than a quick google search could. You're good at this.
I think this man is a singularity, because the world has not seen such a combination of talent and good character. Thanks mate 🙏
slime
Truly phenomenal to live in an age where we can learn all this for free from experts like you. Thank you so much Andrej for your contribution. What a gift you have given.
Thanks so much for making this! I could grasp about 80% of everything with my programming/little bit of university-level machine learning background, but it does not feel like magic anymore. This format of hands-on coding along with the thought process behind it is way better than reading a paper and trying to piece things together.
This is "insane amount of knowledge packed in a video of 2 hours". Hats Off Man!!
I have read this paper and its variants so many times over and yet this is BY FAR the best most comprehensive tutorial on it I have ever experienced. I applaud Andrej for really nailing all of the different components in a very structured way such that it doesn't overwhelm like it did/does for most people who pound their head at it. I will be recommending this video to anyone and everyone - not just practitioners of NLP, ML, or data science.
Giving us these lectures for free. I do not know how to thank you. Great job explaining to us NN so clearly.
I don't have words to describe how grateful I am to you and the work you are doing. Thank you!
The world has got a very good teacher back. Very appreciated.
Fantastic video Andre, your the best and so nice.😊
Amazing, watching these videos I can still believe in human kind, seeing a guy like Andrej sharing his knowledge and his time with the rest of the world is something that we do not see every day. Thanks for posting it!
He's a very good teacher, but there are still islands
I've watched a lot of explanations of Transformers and this is easily the best. You are a gifted teacher.
How to watch this video:
Watch the video once. Watch it again. Watch it a few more times. Then watch 1:01:00 - 1:11:00 about 20 times, melting your brain trying to keep track of tensor dimensions. This is a *dense* video - amazing how much detail is packed into 2 hours... thanks for this Andrej!
I have some experience in understanding the maths behind all this stuff but I kind of had problems with advancing to creating and training models, these videos are a godsend. Big thanks
So happy to see Andrej back teaching more. His articles before Tesla were so illuminating and distilled complicated concepts into things we could all learn from. A true art. Amazing to see videos too.
Thank you so much for creating such valuable content. A few years ago, I watched your 2016 Stanford computer vision course, which was instrumental in helping me understand backpropagation and other important neural network concepts. Andrew Ng's courses initially led me into the world of machine learning, but I find your videos to be equally educational, focused on fundamental concepts, and presented in a very accessible way. I've also been following your blog and was thrilled to learn about your new TH-cam channel. Your dedication to creating these resources is truly appreciated.
Growing up in rural China, I didn't have many opportunities to learn outside of textbooks. But now, thanks to people like you, I find myself swimming in a sea of knowledge. Thank you for making such a significant impact on my learning journey.
BTW, I edited this with chatGPT to make me sounds more like a native speaker. :)
Similar experience here . I too watched Stanfords computer vision and Nlp and a few other courses a while back. I also did lectures of linear algebra,calc, probability stats etc from mit ocw to have a strong grasp of the fundamentals . Without TH-cam it wouldn't be possible for me to have access to such high quality education
can you please share me that link @eva__4380
Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!
Thank you for taking the time and effort to share this, Andrej! This is of great help to lift the veil of abstractions that made it all seem inaccessible and opening up that world to ML/AI uninitiated like me. I don’t understand all of it yet but I’m now oriented and you’ve given me a lot of threads I can pull on.
The calmness with which you deliver these complex topics makes the thing more intuitive and easy to understand, kudos to to you Andrej!
From CS231n and RL Pong to this… there is something special about the way you beak down and explain things. I have benefited immensely from it and I’m obviously not the only one. Thank You!
Andrej, Thank you so much for sharing your knowledge and expertise. I've been following your video series and it has been truly amazing. I remember you were saying in one of the interviews that to prepare 1hour video, it takes more than 10hrs. I cannot thank you enough for what you are doing!
Wow. I thought you gonna use the Transformer library but you essentially build the entire transformer architecture from scratch. Well done!!
Thanks! Yeah, it was a fun challenge building the Transformer from scratch. Glad you're enjoying the video!
Welcome to TH-cam in 2023 where one of the top AI researchers is just casually making videos explaining in detail how to build some of the best ML models. Seriously though, these videos are amazing!
Dude, thank you so much for this. It was a seriously awesome dive into the implementation with great explanations along the way. I've read/watched a lot of ML content and this has got to be one of the clearest lectures I've come across - even better than the usual famous online uni lectures. Thank you! (And I'll be rewatching it too! :)
Ive watched this 3 times and I only understand about 80% of it 😂--a testament to how great Adrej is at explaining these models. I'm not a programmer by trade, so a lot of this is totally foreign to me.
Yeah, there some good explanation in this video nd build up but some of it gets really dense really quickly it goes back to feeling like reading an inscrutable math research paper.
I am speechless - your talent for communicating complex ideas in detail in an understandable way, and re-explaining them with metaphors, is incredible. I feel like I'm genuinely learning more watching these videos and coding along than I ever had in uni. Thank you so much for creating this content!
Thank you Andrej for this wonderful session. I a tech enthusiast and wanted to understand how GPT works and came across your video. I have always found the research papers difficult to comprehend and never understood how they actually get implemented. Your video completely changed that. You are such a good teacher and make things so easy to understand. Your fan club just got a new member!! :)
Absolutely amazing lecture. Thank you so much Andrej! I finally understand Attention and Transformers. "Code is the ultimate truth". And the way to set the stage and explain the concepts and the code is brilliant.
The students at Stanford who had Andrej as a professor are incredibly lucky; he’s an excellent teacher, breaking down complex topics with high precision and fluidity.
Thank you Andrej! I can't imagine the amount of time and effort it took to put this 2 hour video together! Very very educational in breaking down how GPT is constructed. Would love to see a follow-up on tuning the model to answer questions on small scale!
This is truly a step-by-step tutorial for building a Transformer system. So impressive by the way you teach! Very clear and very easy to follow. You are a highly talented educator!
The best notification ever.
Indeed!
Literally took the words out of my mouth. It’s been a while since I’ve instaclicked and watched a 2hr long video. Very much worth it.
Ohhhj sheeeeet, clear my schedule!
Absolutely agree
True!!
I built this same thing alongside watching the lecture, and loved it! I'm trying to get better at understanding and coding these concepts, and this was extremely helpful. Thank you so much :)
Reward model and reinforcement learning using that reward model would be super cool to learn. Thank you for the current lecture!
We are grateful that talented people like you believe in teaching and helping! This is an amazing video. Clear, precise, brings out a tough topic to a layperson! So much to learn on how to make technical videos.
I gonna say this is the best Transformer Tutorial in the world, not one of. Easy and straightforward to understand with detailed step by step explanations even for people with very limited contexts of ML. This video is such a treasury
Just finished watching this (at 2x speed). I love how hands on this is...every other tutorial I have seen always has a step where they say "its roughly like this...." but this one really shows you what is actually needed to make it work. Looking forward to trying this on some fun problems!
Hey Andrej i greatly appreciate you making these videos. Next semester i am taking the course Machine learning for nlp. I think these kinds of implementation videos are incredible for learning a subject deeply
Andrej is pure genius wrapped in a humble person 🙌
Just that little time you take to explain a trick at 47:30 shows how great of a teacher you are, thanks a lot for this video !
This explanation is a masterpiece.
You seem to have a lot of fun too by unveiling concepts (like cross attention) 👏
I found an intuitive explanation for Query/Key/Value in Batool Haider's video which said that Q x K.T / |Q||K| is basically computing the cosine similarity between Q and K which is higher if the vectors are pointing in the same or similar direction which is what yields the "affinity". And this Q, K.T product becomes a mask to the V to see what V we should focus on which is why Q x K.T x V yielding high values for correct predictions becomes the target for the neural network and it learns to do just that. And because it pushes the vectors (indirectly) for C towards similarity, strongly connected items land up "closer" together in the embed space. If this intuition is incorrect, I'm happy to hear how so I can learn.
Thanks for posting this lesson so freely on the internet, Andrej!
Man, all this AI educational content on TH-cam recently makes me want to get back into doing AI experiments
Well i have watched all your video's... i think its time for more 😆
As always fantastic video and sharing... Would be really cool if you would have a part II on this and how we could use PPO/RL to do the fine-tuning part of some basic interactive flow. doesn't have to be like ChatGPT (Q/A). Thank you so much Andrej for such amazing video
!
Grateful for the times we are living in and the easy access to information that we can enjoy. Thanks for sharing your knowledge, much appreciated!
Nothing new in this comment. Just want to say 'thank you!' for this amazing tutorial, ...and all the others! The completeness, the information density and pace, the choice of examples and language.... Everything is *just right* , delivered right from the heart and the mind!! Thank you so much Andrej, for taking your time to educate and inspire all of us.
What a wonderful gift to the world. Amazing tutorial. Again. Thank you!
James! So funny to see your comment here :-) Hope all is well ...
@@AlexanderEgeler small world! 🙂
Thanks a lot Andrej for making such good videos that explain core concepts of neural nets. It would be really helpful if you could make a tutorial/video on the entire workflow and the structured thought process you would follow to train a neural network end to end( to arrive at the final model to be used for production). I mean given a problem statement, how would you train a neural network to solve it , how do you design the experiments to choose the right set of hyperparameters and so on. A hands on tutorial video which would demonstrate this process would definitely help a lot of practitioners trying to use neural networks to solve interesting problems
To my great surprise I understood most of this at at least a conceptual level.
[Probably helps that I watched Stanford EE263 and MIT Gilbert Strang Linear Algebra videos already 🙂]
Thanks very much for this, Andrej!
This has been super informative (or transformative?), thank you!
I think I will need to re-watch many parts to completely understand, but it's great that it's all here.
Try to watch his other make more videos, then understanding this video will be much easier.
He is by far the best teacher for neural networks and AI, ML in general. I highly appreciate your effort Andrej 🙂
anyone from Harkirat Singh's video?
Yeh.🙋
Yes
Yeh
Yes its me
😅😅
Your tutorial finally made me understand what self-attention is. Amazing tutorial and thank you for making these videos!
Just as a suggestion, using C as the channel dimension can be confusing to follow when cross-referencing the pytorch documentation on the cross entropy function. There, C is used as the # of classes. As I was reimplementing your bigram model on my custom dataset with a vocab size larger than the channel size, I ran into IndexErrors. It's worth re-emphasizing that the last dimension expected by the cross entropy function is not the channel size but the # of classes we're trying to predict i.e. the vocab size.
1:41:03 Just for reference. This training took 3 hours, 5 minutes on an 2020 M1 Macbook Air. You can use the "mps" device instead of cuda or cpu.
Is not great, but is not that bad y you are just trying stuff out.
Thank you for your great videos!
I have m2 pro, cpu was much faster for me
Amazing work. I was struggling to understand transformers at both theoretical and practical level but thanks to the brilliant lecture, I struggle no more.
I started this gem in the morning which is one of the best things I did today. Huge respect to you Andrej as there are very few people like you who provide such a valuable content to the people all around the world for free. Thank you very much
ChatGPT feels like more than just a large language model to me. It seems to , or at least projects, an understanding of concepts that I wouldn't expect a pure language model to have.
It would be incredibly cool to see a very simple implementation of the second fine-tuning phase! Good lessons in RL to be had for sure :)
andrej is single handedly putting the open in openai
The explanation of such difficult concepts is so simple! You deserve a lot of attention to your channel.
Wow, this video and everything it covered are just amazing! There are no other words except, thank you, Andrej, for all the efforts it took to make this! Really look forward to more of your great ideas and contents!
Consider pairing this lecture with Andrej's more recent lecture from Stanford's CS25 course on transformers. For those of you who got to this video through the makemore series, this lecture fills in some gaps in some ways when jumping from makemore to transformers, covering the what and why of the attention and transformer architecture:
th-cam.com/video/XfpMkf4rD6E/w-d-xo.htmlsi=RuOEaN-VBGCI96pm.
Thank you Andrej for an incredible series of videos! As a fellow computer scientist (with limited exposure to AI), this series has brought me back to grad school/undregrad re-discovering the passion and joy of learning, thinking about concepts in off time, and getting hands on with the exercises. I've been recommending these to my friends and team.
Can someone please tell me how necessary it is to watch Andrej's previous videos on neural networks? (I have never used PyTorch before)
who came here after Harkirat video
Andrej you are a blessing for all people who want to learn more about AI. Hands down the best explanations, clear instructions and general genuinity. Hope I can thank you in person one day!
MIND = BLOWN! Not only is this incredible content, but the way everything was presented, coded, and explained is so crystal clear, my mind felt comfortable with the complexity. Amazing tutorial, and incredibly inspiring, thanks so much!
That's crazy 🐢 That's actually crazy 🐢 That's messed up 🐢
Andrej - some constructive criticism, I think the lectures would benefit a lot from a higher quality microphone pointed to your mouth. The clicking and typing sounds seem a bit unprofessional and can be distracting to the listener.
Again just constructive criticism, thanks for taking the time to share your knowledge and insights with the world
vedal-sama bring me here
Lol I was hoping I'd find these comments here