Dude I keep accidentally running into your content while learning this material. The other day I was trying firing off weirdly specific google searches while trying to build intuition on how self-attention works and I found a year old comment you wrote on reddit that nailed what I was having trouble with. Just bought your book MEAP, you've been doing an amazing job, keep it up!
Dear Sebastian, I hope you are doing well. I am writing to express my deepest gratitude for your incredible effort and dedication to teaching on the online platform. Your generosity in sharing your knowledge for free has made a profound impact on so many of us. Your classes have been a beacon of light in these challenging times, providing not only education but also inspiration and hope. The clarity with which you explain complex topics and your unwavering patience in addressing our questions have been truly remarkable. Thank you for your time, energy, and passion for teaching. You've made a significant difference in my learning journey, and I am immensely grateful for the knowledge and wisdom you've imparted. Wishing you all the best in your future endeavors. 😊 Warm regards, Hari
Sebastian, I want to sincerely thank you for providing such good material. I cannot express my gratitude enough! I admire your desire to share this content with such clarity and human touch! Thanks a lot!
23 mins in. This is by far, the best tutorial I have seen on building LLMs from scratch. I have followed you for a while Sebastian for all the great contributions you have made over the years, but you have outdone yourself once again. Well done man and Thank you.
Your deep learning series got me through stat 453 at uw Madison and now this workshop has been the perfect transition into LLMs! Great video Sebastian!
Thank you for such an amazing book, such an invaluable source for a beginner like me! I watched the 4-hour lecture by Kapathy and initially thought that your content could hardly be impressed. However, I am "wow" reading through every single chapter of your book.
Thank you. I recently got your book and this stuff is invaluable. So much stuff out there and its not all organized in a way that's easy to digest. Your books / videos are great!
@SebastianRaschka - I just bought the book(How to build a LLM from scratch). Thank you for all your great effort!. :) I look forward to your new content soon. :)
Thank you for developing watermark python package. I became aware of your work because of how amazing watermark was and wanted to find out what else the author is upto!
I am following your blogs from very long time.i have already purchased your new book LLM .I have also purchased your machine learning books.Please upload such contents more .
Thank you very much for giving a short and sweet(i have patience for week long workshops too :D) overview of building an LLM, pre-training and fine-tuning it. Looking to explore deeper from your detailed code base of your book. 🙏
Glad this was useful! Ha, yeah, a week long workshop would be interesting, but with a full-time job, it would be a bit tough to carve out the time to record it 😅
@@SebastianRaschka Completely agree with you. Only if my job workshops would be as useful as these ones. ;) The benefit of these videos is that even though its hours long, i can always pause it and re-visit it when i have time.
"Thank you! I love your work, Sebastian. 😊 I hope my small token of appreciation will motivate you further to create more content like this. By the way, I already own most of your books. My favorite is your recent one - Build a Large Language Model (from Scratch)." 📚
Amazing Sebastian 👏 Thank you so much. I also read your book and found it insightful. Will you be making some content on how we could get the LLM to have UI design like chatGPT?
1:22:40 You are right Sebastion, for me it did not have the peak that you have gotten here. BTW, thanks a lot for this tutorial and your "Introduction to Deep Learning and Generative Modeling" course as well.
Just finished watching the entire video. Amazing! But could you also make a video providing an in-depth understanding of tokenizers? I'm struggling with its implementation especially while modifying the vocabulary for different languages. I've also watched your STAT 453 lectures, which helped me understand GANs and ML models in detail. Thanks a lot. ♥
Great suggestion. I was actually doing that (extending the vocab of a tokenizer and adjusting the embedding layer and output layer of an LLM accordingly) for a little side project. Hope to find the time to put together a tutorial on that some time
Great video so far. I just watched the data prep portion. I am pretty interested in embedding models, so wished you would have gone into that a bit. I understand why it was cut, though. Do you have any videos that explain that part? Thanks again!
Thanks for the tutorial, Sebastian! Quick question. Why is layernorm before attention and before feedforward insrtead of after attention+residual connection and feedforward+residual connection. I understand there is a final norm as well but why before? Thanks!
Good question. There are actually different variants called Pre-LayerNorm and Post-LayerNorm. I summarized it in the section "(3) On Layer Normalization in the Transformer Architecture" here: magazine.sebastianraschka.com/p/understanding-large-language-models
@@SebastianRaschka Thank you so much! Another quick question but kind of on a different tangent. Where does one compare a new model (say I built a new kind of model like a transformer or an RNN) and now I want to test/evaluate it so I can see how does it compare with the existing benchmarks of transformers or LSTMs so I can publish it? Is there a website where I can test this new model on some standard sota dataset they host? Sorry for the ill phrasing. I guess what I want to ask is that is there any website where you put your model and they test it for you on standard NLP tasks? So all you have to do is input your model and the output is the scores of evaluation on NLP tasks which you can then publish (if better)? Again, sorry for the long question but I have been trying to find its answer for a while now.
@@neeravkaushal Good question, I think it can be a bit tricky to get non-standard models in there, but there's tatsu-lab.github.io/alpaca_eval/ and huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
very good job. it is a simple text based model building. if there are complex mathematical equations, graphs and tables related to article related to complex mathematical problems, how can i prepare the model?
That's a good question. It would require a lot of extra work. Probably a book (or at least a workshop) in itself. To understand the general process, I can recommend the Qwen2.5-Math report (arxiv.org/pdf/2409.12122) which outlines how the researchers took a text model (here: Qwen 2) and finetuned it for math.
Hi Sebastian, this was amazing; thank you for making this video! Quick question. I would like to build an LLM for my reading notes and blog posts. I would like to prompt questions, and the LLM should go into the dataset and find the answer. If I were to follow these steps, would I be able to do that? Thanks!
There would be two general approaches: (1) Finetune the model on your dataset or (2) build a RAG application around the model. RAG is a system that feeds a model with chunks from the dataset during inference. I have a brief outline here: github.com/rasbt/RAGs
A year ago, I really wished there was a video like this! Congrats on finishing the project (book) ahead of schedule and distil a year's work into a 3-hour video 😂
Thanks! Working on the book has been intense but also a lot of fun :). The workshop covers only like 10% (otherwise it would be 30 rather than 3 hours) but I hope it’s useful!
I finished the last chapter a few months ago, and it's now been layouted and sent to the printer as of last week, which means the print version should be available soon :)
Thanks for creating such an amazing video!!! Just one quick question, I failed to open the Studio in the Lightning Studio. Any idea? Your response is much appreciated.
Thanks for letting me know. Was there any particular error or issue you were getting. Or, if you don’t mind, could you describe the problem in a bit more detail?
@@SebastianRaschka - Hi, thank you for your prompt reply. Kindly see the error message here . The error message pops up when I hit the button "Open in Studio". Thanks in advance
I'm now 2:00 into this video and I think I'm going to enjoy it! He seems to be one of those who have that distracting verbal tic where he says "Yeah" every 7th word but, fortunately, his S/N ratio appears to be high so we can forgive him...
I have a question regarding the outputs of the llm - what's the point of having the vectors of existing tokens in the output, instead of only the next token's vector? If I understand correctly, those are discarded anyway.
You use them for the next-word prediction task during training. If you have the sentence "the world is round", then this gives you 3 prediction tasks "the -> world", "the world -> is", and "the world is -> round" instead of just one prediction task "the world is -> round"
@@SebastianRaschka Thanks for your reply, I'm trying to understand the rationale. More prediction tasks, so this is mainly a way to increase training efficiency. But it seems to me that by doing this we're training a copy machine along the next-word prediction. I need to read up more in this topic. Thank you so much for the great video!
There's so much happening in this field. I feel overwhelmed, I start with basics but the field is moving so fast and jobs need advanced skills. How do I learn quickly and stay updated? Please suggest me.
Good question: yes and no. It's based on the book but it only covers about ~10%. The code notebooks have also been substantially simplified otherwise it would be a much longer video.
Is the working of BPE covered in your book? you mentioned in the video that It is very long topic to talk so just asking if its covered in the book. Thanks however, for this video. very useful
The book is focused in implementing the LLM, training, and finetuning it etc. But I am planning to add bonus material on implementing BPE. I implemented the algo a while back, just need some time to add explanations.
Hey there. I just double-checked and the supplementary.py file seems to be present in both the GitHub repository and the Studio. Maybe you accidentally deleted or moved it?
Dude I keep accidentally running into your content while learning this material. The other day I was trying firing off weirdly specific google searches while trying to build intuition on how self-attention works and I found a year old comment you wrote on reddit that nailed what I was having trouble with. Just bought your book MEAP, you've been doing an amazing job, keep it up!
Whoa what a small world. Glad you are finding this useful and consider getting a copy of my book!
Can you share the self-attention reddit link
@joneskin1432 - Would you mind sharing what version did you get the MEAP? eBook or Text Book? Mind to share the link? Many thanks!
You are a computer enginneer and still brelive in accidents .
Wake up.
Dear Sebastian,
I hope you are doing well. I am writing to express my deepest gratitude for your incredible effort and dedication to teaching on the online platform. Your generosity in sharing your knowledge for free has made a profound impact on so many of us.
Your classes have been a beacon of light in these challenging times, providing not only education but also inspiration and hope. The clarity with which you explain complex topics and your unwavering patience in addressing our questions have been truly remarkable.
Thank you for your time, energy, and passion for teaching. You've made a significant difference in my learning journey, and I am immensely grateful for the knowledge and wisdom you've imparted.
Wishing you all the best in your future endeavors. 😊
Warm regards,
Hari
Thanks so much for this very kind message, Hari. This is very nice of you, and it's very motivating to hear this!
Sebastian, I want to sincerely thank you for providing such good material. I cannot express my gratitude enough! I admire your desire to share this content with such clarity and human touch! Thanks a lot!
Thanks for the kind words!
23 mins in. This is by far, the best tutorial I have seen on building LLMs from scratch. I have followed you for a while Sebastian for all the great contributions you have made over the years, but you have outdone yourself once again. Well done man and Thank you.
96 mins in. Still awesome.
@@devtest8078 Hah, thanks so much!
👏👏👏
This is a gem for me as a Msc AI student. Thank you for making this.
Your deep learning series got me through stat 453 at uw Madison and now this workshop has been the perfect transition into LLMs! Great video Sebastian!
Wow, small world, and I am glad to hear that this video was useful as well!
Just finished the book, extremely pedagogical and valuable. Great job as always Sebastian!
Thanks for the feedback! Glad you got lots out if it!
Thank you for such an amazing book, such an invaluable source for a beginner like me!
I watched the 4-hour lecture by Kapathy and initially thought that your content could hardly be impressed. However, I am "wow" reading through every single chapter of your book.
I am super glad to hear that the book was worth your while!
Mr Sebastain I found your channel yesterday so greatful to you for such top notch education.
Thank you. I recently got your book and this stuff is invaluable. So much stuff out there and its not all organized in a way that's easy to digest. Your books / videos are great!
Glad to hear that the organization makes it accessible! That’s usually the trickiest part!
Thanks a lot Sebastian! Coding from scratch up made most concepts crystal clear for me.
Nice, I am very glad to hear this!
Just finished the video, thank you very much for the detailed explanation. Next step is reading your book :) 🙂
@SebastianRaschka - I just bought the book(How to build a LLM from scratch). Thank you for all your great effort!. :) I look forward to your new content soon. :)
I hope you are enjoying the book! Happy reading!
What a time to be alive haha, love your book.
Indeed, great book
Which Book are we talking about here?can anyone also give me the name please 🙂
@@deepaksingh9318 LLMs From Scratch at Minning
@@AhmedMostafa-r2u thanks ☺️
Make more videos professor ! Ur knowledge is enlightening me a lot !
Super helpful. Thanks for sharing.
looking forward to more such videos on LLMs.
Keep it up!!
Sebastian I like your deep contents.we appreciate the time you put into this
Thank you for a such a awesome contribution towards democratizing LLM research
Thank you for putting this together. One of the best talks on the technicals.
Thank you for developing watermark python package. I became aware of your work because of how amazing watermark was and wanted to find out what else the author is upto!
Small world 😊
I have read so many of your educational materials and it has been useful that I feel like you are one of my close friends .
Glad my materials are so useful that you keep turning back to them!
I am following your blogs from very long time.i have already purchased your new book LLM .I have also purchased your machine learning books.Please upload such contents more .
Thanks for the kind support!
Thank you very much for giving a short and sweet(i have patience for week long workshops too :D) overview of building an LLM, pre-training and fine-tuning it.
Looking to explore deeper from your detailed code base of your book.
🙏
Glad this was useful! Ha, yeah, a week long workshop would be interesting, but with a full-time job, it would be a bit tough to carve out the time to record it 😅
@@SebastianRaschka Completely agree with you. Only if my job workshops would be as useful as these ones. ;)
The benefit of these videos is that even though its hours long, i can always pause it and re-visit it when i have time.
@@SHAMIKII Thanks for the kind compliment!
"Thank you! I love your work, Sebastian. 😊
I hope my small token of appreciation will motivate you further to create more content like this.
By the way, I already own most of your books. My favorite is your recent one - Build a Large Language Model (from Scratch)." 📚
Wow, thanks so much for the kind support!
Very great job! Just bought your book!
Thanks, happy reading and coding!
I am reading your book from mannings library loving it.
Thanks! Happy to hear this!
Your book was already a great read and practice.
Glad to hear that you got lots out of my book?
Amazing Sebastian 👏 Thank you so much. I also read your book and found it insightful. Will you be making some content on how we could get the LLM to have UI design like chatGPT?
This is an interesting point. It would be interesting but since I don’t enjoy web development very much, I don’t have any fixed plans for that yet.
Incredible! Thanks for sharing this great resource.
I think it is exactly what I was waiting for 😍
Happy coding!
Great explanation as usual. Thank for sharing.
This is absolutely amazing. On WIsonsin!
always look forward to your content. 👍
1:22:40 You are right Sebastion, for me it did not have the peak that you have gotten here. BTW, thanks a lot for this tutorial and your "Introduction to Deep Learning and Generative Modeling" course as well.
outstanding Doc, this wunderbar...thank you 🤙
Just finished watching the entire video. Amazing! But could you also make a video providing an in-depth understanding of tokenizers? I'm struggling with its implementation especially while modifying the vocabulary for different languages.
I've also watched your STAT 453 lectures, which helped me understand GANs and ML models in detail. Thanks a lot. ♥
Great suggestion. I was actually doing that (extending the vocab of a tokenizer and adjusting the embedding layer and output layer of an LLM accordingly) for a little side project. Hope to find the time to put together a tutorial on that some time
@@SebastianRaschka Thanks for considering! Really looking forward to it.
Also check out Karpathys 2hour video on building a tokenizer: th-cam.com/video/zduSFxRajkE/w-d-xo.html
@@brenok Oh. Thanks a lot. I completely forgot to check Andrej's channel. Thanks for the reference.
Great video so far. I just watched the data prep portion. I am pretty interested in embedding models, so wished you would have gone into that a bit. I understand why it was cut, though. Do you have any videos that explain that part? Thanks again!
Thank you for this❤ Such a detailed explanation!
Love your work!
Thanks for the tutorial, Sebastian! Quick question. Why is layernorm before attention and before feedforward insrtead of after attention+residual connection and feedforward+residual connection. I understand there is a final norm as well but why before? Thanks!
Good question. There are actually different variants called Pre-LayerNorm and Post-LayerNorm. I summarized it in the section "(3) On Layer Normalization in the Transformer Architecture" here: magazine.sebastianraschka.com/p/understanding-large-language-models
@@SebastianRaschka Thank you so much! Another quick question but kind of on a different tangent. Where does one compare a new model (say I built a new kind of model like a transformer or an RNN) and now I want to test/evaluate it so I can see how does it compare with the existing benchmarks of transformers or LSTMs so I can publish it? Is there a website where I can test this new model on some standard sota dataset they host? Sorry for the ill phrasing. I guess what I want to ask is that is there any website where you put your model and they test it for you on standard NLP tasks? So all you have to do is input your model and the output is the scores of evaluation on NLP tasks which you can then publish (if better)? Again, sorry for the long question but I have been trying to find its answer for a while now.
@@neeravkaushal Good question, I think it can be a bit tricky to get non-standard models in there, but there's tatsu-lab.github.io/alpaca_eval/ and huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
@@SebastianRaschka Thank you so much. Very helpful. :-)
Thanks for this. really appreciated
very good job. it is a simple text based model building. if there are complex mathematical equations, graphs and tables related to article related to complex mathematical problems, how can i prepare the model?
That's a good question. It would require a lot of extra work. Probably a book (or at least a workshop) in itself. To understand the general process, I can recommend the Qwen2.5-Math report (arxiv.org/pdf/2409.12122) which outlines how the researchers took a text model (here: Qwen 2) and finetuned it for math.
Great Video, thanks for putting in the time!
Great, keep it coming, hope to use.
Excellent video and book! Maybe a sequel about LLM inference, like KV cache and other acceleration schemes?
Yeah, this would be a good topic for another book one day…
Wow what a blessing 🎉
Hi Sebastian, this was amazing; thank you for making this video!
Quick question. I would like to build an LLM for my reading notes and blog posts. I would like to prompt questions, and the LLM should go into the dataset and find the answer.
If I were to follow these steps, would I be able to do that?
Thanks!
There would be two general approaches: (1) Finetune the model on your dataset or (2) build a RAG application around the model. RAG is a system that feeds a model with chunks from the dataset during inference. I have a brief outline here: github.com/rasbt/RAGs
A year ago, I really wished there was a video like this! Congrats on finishing the project (book) ahead of schedule and distil a year's work into a 3-hour video 😂
Thanks! Working on the book has been intense but also a lot of fun :). The workshop covers only like 10% (otherwise it would be 30 rather than 3 hours) but I hope it’s useful!
Thanks for this workshop. Did you finish the book or is it still under development?
I finished the last chapter a few months ago, and it's now been layouted and sent to the printer as of last week, which means the print version should be available soon :)
Thanks for creating such an amazing video!!! Just one quick question, I failed to open the Studio in the Lightning Studio. Any idea? Your response is much appreciated.
Thanks for letting me know. Was there any particular error or issue you were getting. Or, if you don’t mind, could you describe the problem in a bit more detail?
@@SebastianRaschka - Hi, thank you for your prompt reply. Kindly see the error message here . The error message pops up when I hit the button "Open in Studio". Thanks in advance
@@thehard-coder9398 Huh, that's a weird one, I will ask my colleagues to see what's up. Thanks!
@@SebastianRaschka - Thanks! I look forward to hearing from your response soon. :)
@@thehard-coder9398 We tried to reproduce this issue but couldn't find the issue. Could you give it another try?
Dropping heat as usual
Awesome 👏🏻
(Even though awesome is an understatement…)
Thanks!
many thanks!
I'm now 2:00 into this video and I think I'm going to enjoy it! He seems to be one of those who have that distracting verbal tic where he says "Yeah" every 7th word but, fortunately, his S/N ratio appears to be high so we can forgive him...
Yeah, the free version has a lot of these
I have a question regarding the outputs of the llm - what's the point of having the vectors of existing tokens in the output, instead of only the next token's vector? If I understand correctly, those are discarded anyway.
You use them for the next-word prediction task during training. If you have the sentence "the world is round", then this gives you 3 prediction tasks "the -> world", "the world -> is", and "the world is -> round" instead of just one prediction task "the world is -> round"
@@SebastianRaschka Thanks for your reply, I'm trying to understand the rationale. More prediction tasks, so this is mainly a way to increase training efficiency. But it seems to me that by doing this we're training a copy machine along the next-word prediction. I need to read up more in this topic. Thank you so much for the great video!
lovely. thank you
Way to go, Seb! 🖐️
awesome!
Thank You
Thanks!
Thanks for the very kind support!
Is the print version of book available ? Amazon shows availability sometime in late October?
@SebastianRaschka Is the print version of book available ? Amazon shows availability sometime in late October?
Yes!
There's so much happening in this field. I feel overwhelmed, I start with basics but the field is moving so fast and jobs need advanced skills. How do I learn quickly and stay updated? Please suggest me.
Is this a companion video of your LLM Book?
Good question: yes and no. It's based on the book but it only covers about ~10%. The code notebooks have also been substantially simplified otherwise it would be a much longer video.
Is the working of BPE covered in your book? you mentioned in the video that It is very long topic to talk so just asking if its covered in the book. Thanks however, for this video. very useful
The book is focused in implementing the LLM, training, and finetuning it etc. But I am planning to add bonus material on implementing BPE. I implemented the algo a while back, just need some time to add explanations.
@@SebastianRaschka thank you, i read your other book on pytorch and machine learning. It was very good. I will buy this one as well. Thanks
39:20 ,my code is throwing me an error stating there is no recognised package called supplementary
Anyone please help to tackle this
Hey there. I just double-checked and the supplementary.py file seems to be present in both the GitHub repository and the Studio. Maybe you accidentally deleted or moved it?
Code area are to small cant see
Lewis Donna Walker Jennifer Johnson Mary
Doctor- You only have 2:45:10 to live.
Me: