I'm a student for life....approaching 40.....never had the privilege of attending a university like Stanford. To get access to these quality lectures is amazing. Thank you
It is one thing to be a great research institution but to be a great research institution that is full of talented and kind lecturers is extremely impressive. I've been impressed by every single Stanford course and lecture I have participated in through SCPD and TH-cam and this lecturer is no exception.
lecture was perfect. is there a playlist for the whole class of cs229 for the same semester as this video? all I have found was before 2022 which made me wondering
00:10 Building Large Language Models overview 02:21 Focus on data evaluation and systems in industry over architecture 06:25 Auto regressive language models predict the next word in a sentence. 08:26 Tokenizing text is crucial for language models 12:38 Training a large language model involves using a large corpus of text. 14:49 Tokenization process considerations 18:40 Tokenization improvement in GPT 4 for code understanding 20:31 Perplexity measures model hesitation between tokens 24:18 Comparing outputs and model prompting 26:15 Evaluation of language models can yield different results 30:15 Challenges in training large language models 32:06 Challenges in building large language models 35:57 Collecting real-world data is crucial for large language models 37:53 Challenges in building large language models 41:38 Scaling laws predict performance improvement with more data and larger models 43:33 Relationship between data, parameters, and compute 47:21 Importance of scaling laws in model performance 49:12 Quality of data matters more than architecture and losses in scaling laws 52:54 Inference for large language models is very expensive 54:54 Training large language models is costly 59:12 Post training aligns language models for AI assistant use 1:01:05 Supervised fine-tuning for large language models 1:04:50 Leveraging large language models for data generation and synthesis 1:06:49 Balancing data generation and human input for effective learning 1:10:23 Limitations of human abilities in generating large language models 1:12:12 Training language models to maximize human preference instead of cloning human behaviors. 1:16:06 Training reward model using softmax logits for human preferences. 1:18:02 Modeling optimization and challenges in large language models (LLMs) 1:21:49 Reinforcement learning models and potential benefits 1:23:44 Challenges with using humans for data annotation 1:27:21 LLMs are cost-effective and have better agreement with humans than humans themselves 1:29:12 Perplexity is not calibrated for large language models 1:33:00 Variance in performance of GPT-4 based on prompt specificity 1:34:51 Pre-training data plays a vital role in model initialization 1:38:32 Utilize GPUs efficiently with matrix multiplication 1:40:21 Utilizing 16 bits for faster training in deep learning 1:44:08 Building Large Language Models from scratch Crafted by Merlin AI.
Insights By "YouSum Live" 00:00:05 Building large language models (LLMs) 00:00:59 Overview of LLM components 00:01:21 Importance of data in LLM training 00:02:59 Pre-training models on internet data 00:04:48 Language models predict word sequences 00:06:02 Auto-regressive models generate text 00:10:48 Tokenization is crucial for LLMs 00:19:12 Evaluation using perplexity 00:22:07 Challenges in evaluating LLMs 00:29:00 Data collection is a significant challenge 00:41:08 Scaling laws improve model performance 01:00:01 Post-training aligns models with user intent 01:02:26 Supervised fine-tuning enhances model responses 01:10:00 Reinforcement learning from human feedback 01:19:01 DPO simplifies reinforcement learning process 01:28:01 Evaluation of post-training models 01:37:20 System optimization for LLM training 01:39:05 Low precision improves GPU efficiency 01:41:38 Operator fusion enhances computational speed 01:44:23 Future considerations for LLM development Insights By "YouSum Live"
This is really a great lecture, super dense but still digestible. Its not even been 2 years since ChatGPT was released to public and to see the rapid pace of research around LLMs and it getting better is really interesting. Thank you so much, now I have some papers to read to further my understanding.
00:10 Обзор создания больших языковых моделей 02:21 Сосредоточьтесь на оценке данных и системах на практике 06:25 Авторегрессивные языковые модели предсказывают следующее слово 08:26 Токенизация текста и размер словаря имеют решающее значение для языковых моделей. 12:38 Токенизация и обучение токенизаторов 14:49 Оптимизация процесса токенизации и решения по объединению токенов 18:40 GPT 4 улучшил токенизацию для лучшего понимания кода 20:31 Переплетение измеряет колебания модели между словами. 24:18 Оценка открытых вопросов является сложной задачей. 26:15 Различные способы оценки крупных языковых моделей 30:15 Шаги по предварительной обработке веб-данных для больших языковых моделей 32:06 Проблемы с обработкой дубликатов и фильтрацией низкокачественных документов в больших масштабах. 35:57 Сбор данных о мире имеет решающее значение для практических крупных языковых моделей. 37:53 Проблемы при предобучении крупных языковых моделей 41:38 Законы масштабирования предсказывают улучшение производительности с увеличением объема данных и размером моделей. 43:33 Вычисления определяются данными и параметрами. 47:21 Понимание значения законов масштабирования при создании больших языковых моделей 49:12 Хорошие данные имеют решающее значение для лучшего масштабирования. 52:54 Вывод для больших языковых моделей дорогой. 54:54 Обучение крупных языковых моделей требует высоких вычислительных затрат. 59:12 Большие языковые модели (LLM) требуют дообучения для выравнивания, чтобы стать AI-ассистентами. 1:01:05 Создание крупных языковых моделей (LLM) включает в себя тонкую настройку предварительно обученных моделей на желаемых данных. 1:04:50 Предобученные языковые модели оптимизируют под конкретные типы пользователей во время дообучения. 1:06:49 Сбалансирование генерации синтетических данных с человеческим вводом имеет решающее значение для эффективного обучения. 1:10:23 Проблемы в создании контента, превышающего человеческие способности 1:12:12 Генерация идеальных ответов с использованием максимизации предпочтений 1:16:06 Обучение модели вознаграждения с использованием логитов для непрерывных предпочтений 1:18:02 Обучение крупных языковых моделей с помощью ПО и проблемы в обучении с подкреплением 1:21:49 Обсуждение о методах обучения с подкреплением и их преимуществах в использовании моделей наград. 1:23:44 Проблемы использования людей в качестве аннотаторов данных 1:27:21 LLM более экономичны и предлагают лучшее согласие, чем люди. 1:29:12 Проблемы с перплексией и калибровкой в языковых моделях 1:33:00 Вариативность в производительности GPT-4 в зависимости от подсказок 1:34:51 Важность предобучения в больших языковых моделях 1:38:32 Использование ГПУ для умножения матриц может быть в 10 раз быстрее, но коммуникация и память играют ключевую роль. 1:40:21 Уменьшенная точность для более быстрой матричной умножения 1:44:08 Создание больших языковых моделей (ЯМП) Crafted by Merlin AI.
people should first learn about basic language models like bigrams, unigrams. these were the first language models and stanford really has good lectures in it
This course has so much of insights and a quick summary view of LLMs. I have also gone through coursera course paid one. This one is equally good and free. Thanks for the video.
This is very well done. It's super easy to understand. I think your students should learn a lot. It's a great skill to be able to present complex material in a simple fashion. It means you really understand both the material and your audience.
I had the privilege of attending an insightful 90-minute lecture by Stanford faculty, which greatly boosted my confidence in completing my thesis. The approach they shared aligns closely with my own research methodology, reinforcing the direction of my work. Grateful for this inspiring experience!"
great! thanks for sharing! One thing i would suggest is to transcribe or add subtitle of questions that is being asked by the students. That way we could better understand the answer given by lecturer.
What an awesome video. Data quality is a real issue, and even more interestingly, LLM’s learn a lot like humans. Introduce the simpler concepts first (training data prompts) and then introduce more complex subjects, and the LLM’s learn more just like humans
This is an amazing breakdown of the high level overview of an LLM’s. Every aspect of an LLM was mentioned. Thank you for this amazing video. I’ll come back here often
@5:55 there is an approximation. it lies on the axioms. the axiom being probability should sum to 1. second the approximation is that distribution only comes out of the given corpora. The given corpora is the approximation of the total population. Which we all know has its own biases.
How do people know that "adding more data" is not just increasing likelihood of training on something from the benchmarks, while "adding more parameters" is not just increasing the recall abilities (parametric memory capacity) of the model to retrieve benchmark stuff during evaluation? Really curious about that point.
This genius saying "2K return tickets from JFK to LDN are not significant" (in terms of environmental impact) and that "next models will be +10X FLOPS" just makes me conclude that these guys are not only throwing money at the problem (i.e. gen AI) but don't have a thoughtful solution on how to train AI considering the environment and economic aspects of it.
I'm a student for life....approaching 40.....never had the privilege of attending a university like Stanford. To get access to these quality lectures is amazing. Thank you
This is a quality lecture?
@@Fracasse-0x13 for people who dont have access to education, yes, it is a quality lecture.
i am living my dreams
@@Fracasse-0x13 why this is not a quality lecture?
They all the same
Everything is on the web
you don't need certification to tell the world you know it
Build the best
It is one thing to be a great research institution but to be a great research institution that is full of talented and kind lecturers is extremely impressive. I've been impressed by every single Stanford course and lecture I have participated in through SCPD and TH-cam and this lecturer is no exception.
Thank you for sharing your positive experiences with our courses and lectures!
Slides: drive.google.com/file/d/1B46VFrqFAPAEj3kaCrBAtQqeh2_Ztawl/view?usp=sharing
Thank you sir...i heartly appreciate it😊.... lecture was awesome 🤌
thankyou so much. i really appreciate it
lecture was perfect. is there a playlist for the whole class of cs229 for the same semester as this video? all I have found was before 2022 which made me wondering
@@helloadventureworld no, the rest of CS229 has not been released and I don't know if it will. This is only the guest lecture.
@@yanndubois3914 Thanks for the response and information you have shared :)
He is doing his part to encourage women in STEM.
Women have always been in STEM. We all know about Grace Hopper. Please let this go.
Lookup Ruth David. She worked at the CIA redid all of their tech infrastructure and she’s still alive!
Is my teachers in school looked this good, I wouldn't miss a single class. He's handsome af.
🤣🤣🤣😂
No because fr
Came for the speaker; stayed for the knowledge.
… strange
I'm a straight dude and even I'm like "DAMN!"
Suddenly I am interested in LLMS
I might not know what you are saying but I have the same feeling as you lol.
😂😂😂
🤣🤣
00:10 Building Large Language Models overview
02:21 Focus on data evaluation and systems in industry over architecture
06:25 Auto regressive language models predict the next word in a sentence.
08:26 Tokenizing text is crucial for language models
12:38 Training a large language model involves using a large corpus of text.
14:49 Tokenization process considerations
18:40 Tokenization improvement in GPT 4 for code understanding
20:31 Perplexity measures model hesitation between tokens
24:18 Comparing outputs and model prompting
26:15 Evaluation of language models can yield different results
30:15 Challenges in training large language models
32:06 Challenges in building large language models
35:57 Collecting real-world data is crucial for large language models
37:53 Challenges in building large language models
41:38 Scaling laws predict performance improvement with more data and larger models
43:33 Relationship between data, parameters, and compute
47:21 Importance of scaling laws in model performance
49:12 Quality of data matters more than architecture and losses in scaling laws
52:54 Inference for large language models is very expensive
54:54 Training large language models is costly
59:12 Post training aligns language models for AI assistant use
1:01:05 Supervised fine-tuning for large language models
1:04:50 Leveraging large language models for data generation and synthesis
1:06:49 Balancing data generation and human input for effective learning
1:10:23 Limitations of human abilities in generating large language models
1:12:12 Training language models to maximize human preference instead of cloning human behaviors.
1:16:06 Training reward model using softmax logits for human preferences.
1:18:02 Modeling optimization and challenges in large language models (LLMs)
1:21:49 Reinforcement learning models and potential benefits
1:23:44 Challenges with using humans for data annotation
1:27:21 LLMs are cost-effective and have better agreement with humans than humans themselves
1:29:12 Perplexity is not calibrated for large language models
1:33:00 Variance in performance of GPT-4 based on prompt specificity
1:34:51 Pre-training data plays a vital role in model initialization
1:38:32 Utilize GPUs efficiently with matrix multiplication
1:40:21 Utilizing 16 bits for faster training in deep learning
1:44:08 Building Large Language Models from scratch
Crafted by Merlin AI.
Insights By "YouSum Live"
00:00:05 Building large language models (LLMs)
00:00:59 Overview of LLM components
00:01:21 Importance of data in LLM training
00:02:59 Pre-training models on internet data
00:04:48 Language models predict word sequences
00:06:02 Auto-regressive models generate text
00:10:48 Tokenization is crucial for LLMs
00:19:12 Evaluation using perplexity
00:22:07 Challenges in evaluating LLMs
00:29:00 Data collection is a significant challenge
00:41:08 Scaling laws improve model performance
01:00:01 Post-training aligns models with user intent
01:02:26 Supervised fine-tuning enhances model responses
01:10:00 Reinforcement learning from human feedback
01:19:01 DPO simplifies reinforcement learning process
01:28:01 Evaluation of post-training models
01:37:20 System optimization for LLM training
01:39:05 Low precision improves GPU efficiency
01:41:38 Operator fusion enhances computational speed
01:44:23 Future considerations for LLM development
Insights By "YouSum Live"
This is really a great lecture, super dense but still digestible. Its not even been 2 years since ChatGPT was released to public and to see the rapid pace of research around LLMs and it getting better is really interesting. Thank you so much, now I have some papers to read to further my understanding.
00:10 Обзор создания больших языковых моделей
02:21 Сосредоточьтесь на оценке данных и системах на практике
06:25 Авторегрессивные языковые модели предсказывают следующее слово
08:26 Токенизация текста и размер словаря имеют решающее значение для языковых моделей.
12:38 Токенизация и обучение токенизаторов
14:49 Оптимизация процесса токенизации и решения по объединению токенов
18:40 GPT 4 улучшил токенизацию для лучшего понимания кода
20:31 Переплетение измеряет колебания модели между словами.
24:18 Оценка открытых вопросов является сложной задачей.
26:15 Различные способы оценки крупных языковых моделей
30:15 Шаги по предварительной обработке веб-данных для больших языковых моделей
32:06 Проблемы с обработкой дубликатов и фильтрацией низкокачественных документов в больших масштабах.
35:57 Сбор данных о мире имеет решающее значение для практических крупных языковых моделей.
37:53 Проблемы при предобучении крупных языковых моделей
41:38 Законы масштабирования предсказывают улучшение производительности с увеличением объема данных и размером моделей.
43:33 Вычисления определяются данными и параметрами.
47:21 Понимание значения законов масштабирования при создании больших языковых моделей
49:12 Хорошие данные имеют решающее значение для лучшего масштабирования.
52:54 Вывод для больших языковых моделей дорогой.
54:54 Обучение крупных языковых моделей требует высоких вычислительных затрат.
59:12 Большие языковые модели (LLM) требуют дообучения для выравнивания, чтобы стать AI-ассистентами.
1:01:05 Создание крупных языковых моделей (LLM) включает в себя тонкую настройку предварительно обученных моделей на желаемых данных.
1:04:50 Предобученные языковые модели оптимизируют под конкретные типы пользователей во время дообучения.
1:06:49 Сбалансирование генерации синтетических данных с человеческим вводом имеет решающее значение для эффективного обучения.
1:10:23 Проблемы в создании контента, превышающего человеческие способности
1:12:12 Генерация идеальных ответов с использованием максимизации предпочтений
1:16:06 Обучение модели вознаграждения с использованием логитов для непрерывных предпочтений
1:18:02 Обучение крупных языковых моделей с помощью ПО и проблемы в обучении с подкреплением
1:21:49 Обсуждение о методах обучения с подкреплением и их преимуществах в использовании моделей наград.
1:23:44 Проблемы использования людей в качестве аннотаторов данных
1:27:21 LLM более экономичны и предлагают лучшее согласие, чем люди.
1:29:12 Проблемы с перплексией и калибровкой в языковых моделях
1:33:00 Вариативность в производительности GPT-4 в зависимости от подсказок
1:34:51 Важность предобучения в больших языковых моделях
1:38:32 Использование ГПУ для умножения матриц может быть в 10 раз быстрее, но коммуникация и память играют ключевую роль.
1:40:21 Уменьшенная точность для более быстрой матричной умножения
1:44:08 Создание больших языковых моделей (ЯМП)
Crafted by Merlin AI.
Damn. That lecturer is fineeee. 😍
We live in a tremendous moment in time. Free access to the best lectures on the most relevant topic from the best university
Thanks for your comment, we love to hear this feedback!
people should first learn about basic language models like bigrams, unigrams. these were the first language models and stanford really has good lectures in it
This course has so much of insights and a quick summary view of LLMs. I have also gone through coursera course paid one. This one is equally good and free. Thanks for the video.
finally a someone said Machine Learning instead of slapping AI on everything!
I feel that whenever someone talks about AI a lot it means that they know nothing about it
Right? And a lot of people believing in Yubal Harari because of it
what a wonderful lectures...this 1.75 hour is one of the most valuable in my life
Thank you for the video! I am glad that we live in this time and can witness the development of AI technologies.
Phenomenal Explaiantion, Love for Stanford , Professors and their methodolgies is a never ending tale !!
This is very well done. It's super easy to understand. I think your students should learn a lot. It's a great skill to be able to present complex material in a simple fashion. It means you really understand both the material and your audience.
fantastic, wonderful, significant, magnificent, outstanding, class of titans, world-class🎉
Really incredible delivery of complicated information. ❤
Dayum he’s fine
one good point when they discuss the difference between ppo and dpo is reward model can reduce the dependency of labeled preference data
I love the way you answered the questions, very clear and precise.
Amazing lecture. Great job
I had the privilege of attending an insightful 90-minute lecture by Stanford faculty, which greatly boosted my confidence in completing my thesis. The approach they shared aligns closely with my own research methodology, reinforcing the direction of my work. Grateful for this inspiring experience!"
Best explanation.. I'm watching at 3 am. Thanks
Thanks a lot for sharing this. I would like to point a correction-
time 20:28 -
Consider case prob(true_token)
Yes that's correct, it's the baseline performance of a very bad language model.
Thanks for sharing this. It is a great introduction of the LLM system.
Very informative, updated and crisp~ keep them coming..don't stop now!
Great talk. Loved the level of detail, the insights, the pacing.
great! thanks for sharing! One thing i would suggest is to transcribe or add subtitle of questions that is being asked by the students. That way we could better understand the answer given by lecturer.
What an awesome video. Data quality is a real issue, and even more interestingly, LLM’s learn a lot like humans. Introduce the simpler concepts first (training data prompts) and then introduce more complex subjects, and the LLM’s learn more just like humans
Ignore this comment
Day 1 19:05
Day 2 28:38
Day 3 41:05
Day 4 1:00:00
this was genuinely interesting and easy to follow through, thanks!
This is an amazing breakdown of the high level overview of an LLM’s. Every aspect of an LLM was mentioned. Thank you for this amazing video. I’ll come back here often
Thank you for the gem Standford Online. Great starter - Time to read more papers on LLMs
Fabulous lecture! Goes into all important concepts and also highlights the interesting details that are commonly glossed over, thanks for recording!
Wow! Such a wonderful presentation! Thanks so much!
Great & Comprehensive Presentation 🎉
Great presentation and very helpful. Thanks for sharing this
@5:55 there is an approximation. it lies on the axioms. the axiom being probability should sum to 1. second the approximation is that distribution only comes out of the given corpora. The given corpora is the approximation of the total population. Which we all know has its own biases.
My sincere thanks for sharing it.
You can build my ❤️
I don’t know what the guy is talking about but imma watch HIM
LLM - chatbots
Architecture (Neural networks)
Training algorithm
Data
Evaluation
System
How do people know that "adding more data" is not just increasing likelihood of training on something from the benchmarks, while "adding more parameters" is not just increasing the recall abilities (parametric memory capacity) of the model to retrieve benchmark stuff during evaluation? Really curious about that point.
This is a gold mine
As a gay guy who studied EE and CS at Stanford, I can confirm I had a crush on him
Please give this dude 15more minutes, for Tiling, Flash Attention, Parallelization for data and model !!
If you know all of that, you don't need 15 more minutes.
great lecture, wish the speaker had more time to go over the full presentation
Great lecture
It's never too late to get started for learning
What an amazing lecture, now want a part 2 about the topics that haven’t been touched upon 🤩
would love to see the other recordings of cs25!
The best one we want more
thank you! great lecture.
Most amazing video ever
When will the other lectures be updated? This was so good!
Looking forward to do a PostDoc from SU
man this is amazing!
this is amazing, can you guys make a playlist for begginers?. thank you!
Just Amazing!
what is that paper that mentions from last year at 1:27:25 which is 50x cheaper and better than human agreements?
Great content, thanks!
More lecture of Machine learning plz share
The reason Stanford graduate the rule the world
So Amazing!
Thank you for this
I'm just trying to get started in ML. Good god. Do a you tube channel already. Really good. Or at least do some blog updates.
Thank you! 🚀
Could you please share the link to the lecture on Transformers that you were referring to in the video?
Can we please have access to the previous lecture about Transformers?
suddenly i m interested in llms😗😗😗
I like his teaching style and that laughter in between 😂😁🤙. Last one be careful heavyone
Yes!
🇰🇪 well Represented.
Impressive
Whoever records these videos need to leave the slides up longer for the viewers to read as the speaker explains the concepts.
to which playlist does this belong to?
Where can we find the rest of the videos for CS229 summer 2024?
lets call him captain LLM looks a bit like chris evans
agree
Yann, if you ever get to read this, you are a truly handsome man. I
Cringe
Does anyone have the pdf or ppt for this lecture, if so please reply to this comment. Thanks!
Thank u
is there a way to add sections so we can return to specific parts later?
thanks for this great lecture. Is also the lecture on transformers available somewhere?
You might be interested in the lectures in this playlist: th-cam.com/play/PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM.html&si=KmCNuzfcc_E0cxDg
What class is the part of?
Handsome Modeling 225
It feels like learning LLMs from clark kent (superman) 😂😅
so good ,
This interests me but I have no coding experience. Any tips to where to start , surely Standford lectures ? Coding 101 I guess. Anything helps :)
The training algorithm is actually the key... It is because of RLHF that we have GPT-4
anybody know of any resources for learning LLM?
thanks ❤️🤍
IDK what he is talking, but he is super-hot, i decided to watch 5 times, it's really good for my health,>3
great
Anyone here took the class in which this lecture was held ( cs229 summer 2024) ?
This genius saying "2K return tickets from JFK to LDN are not significant" (in terms of environmental impact) and that "next models will be +10X FLOPS" just makes me conclude that these guys are not only throwing money at the problem (i.e. gen AI) but don't have a thoughtful solution on how to train AI considering the environment and economic aspects of it.
I have a doubt in Scalable data for SFT, isn't the model be biased as its using its own knowledge to generate dataset and further trained on the same?
Steve rogers talking about AI❤