Speculations on Test-Time Scaling (o1)

Do we need Attention? - Linear RNNs and State Space Models (SSMs) for NLP

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Incredibox Sprunki vs Inside Out 2 - Which team will win? #shorts #animation

แจ๊สสร้างตำนานอีกแล้ว 😂😂 #แจ๊สชวนชื่น #แจ๊สแจง #แจงปุณณาสา #ก็มาดิครับ #ตลก #shorts

🔴LIVE เชียร์สด : แมนเชสเตอร์ ยูไนเต็ด พบ เลสเตอร์ ซิตี้ | รุด ฟาน นิสเตลรอย คุมผีแดงนัดส่งท้าย MW11

Large Language Models in Five Formulas

Sasha Rush 🤗

มุมมอง 35 609

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 12 พ.ย. 2024
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 32

@nintishia 9 หลายเดือนก่อน ⁺⁷
This appears to be a distillation of the most important concepts in large language models today. Thanks for the exposition.
@muhannadobeidat 9 หลายเดือนก่อน ⁺⁵
Extremely high entropy video. Amazing clarity, delivery, content, and follow. Pure genius!
@sarthak-ti 8 หลายเดือนก่อน ⁺²
I found this to be an incredibly unique and interesting approach to explaining LLMs, an excellent introduction, thank you so much for the video!
@DistortedV12 9 หลายเดือนก่อน ⁺¹³
This is a great modern supplement to Karpathy's guide to language models! Thanks Sasha! Just subbed
@icriou 9 หลายเดือนก่อน ⁺¹
Knowledge/sec in this video is off the chart, and the info is cutting edge!
@joedigiovanni8758 8 หลายเดือนก่อน
Excellent presentation! Easy to follow and tons of great material including the links to the slides
@sheikhakbar2067 9 หลายเดือนก่อน
Thank you for making this video so interesting with those nice graphics and examples. I need to sit down and watch it attentively.
@arkaprovobhattacharjee8691 8 หลายเดือนก่อน ⁺¹
For someone like me who is new to this field and wants to understand the nitty-gritty of language models, it's necessary to see each part separately, understand it first, and then move on to the next part. But still, I can sense how fantastically it is explained to those who have the basic understanding of deep learning.
@donatocapitella 7 หลายเดือนก่อน
Amazing content, thanks for putting this together!
@syedmostofamonsur7583 9 หลายเดือนก่อน
Thanks a lot Prof. Rush for this material.
@ItzGanked 9 หลายเดือนก่อน
Thanks for the video good high level overview. I like the excalidraw slides also
@pebre79 9 หลายเดือนก่อน
This is very insightful. Thanks for posting!
@ChinaTalkMedia 8 หลายเดือนก่อน
this was a wonderful video thanks so much for this
@FabienFabienB 9 หลายเดือนก่อน
Great complement to Karphathy's video
@corgirun7892 8 หลายเดือนก่อน
amazing video！
@hernanlira811 9 หลายเดือนก่อน
Great video!
@brandonsager223 9 หลายเดือนก่อน
Very nice talk
@ClydeWright 9 หลายเดือนก่อน
Excellent talk!! Will recommend to all my coworkers.
@drayg0n806 9 หลายเดือนก่อน
amazing video!
@excalidraw 9 หลายเดือนก่อน
Awesome! 🙌
@Rajistics 9 หลายเดือนก่อน
So good!
@andrewdunbar828 9 หลายเดือนก่อน
I'm perpexed.
@shubhamtoshniwal2221 8 หลายเดือนก่อน
Hey Sasha, What tools do you use to make your presentations? It's so different from the typical academic presentations :)
@benjaminsteenhoek3842 8 หลายเดือนก่อน
Thanks for this awesome explanation! Can someone explain one point to me? The issue with argmax at 22:15 is that it has no derivative, so neural network parameters cannot be trained using it. If I understand correctly, the argmax is the word which should be "attended" when predicting the next word (park). Why is argmax the desired function here - what if the prediction of the next word depends on not the most important single word, but the most important two words in the context? Considering this case, doesn't softmax have an additional benefit over the "naive" argmax that it can also compute distributions with more than one mode?
@srush_nlp 8 หลายเดือนก่อน ⁺¹
This is a good point. One detail I didn't mention is that at each layer there are multiple "heads" each with a different query so even if you have an argmax you still get to select multiple words per layer. But even so your point is fair that there may be other advantages to softmax besides easier learning.
@benjaminsteenhoek3842 8 หลายเดือนก่อน
That makes sense. Thanks for your helpful reply!
@ZylinTeo 8 หลายเดือนก่อน
At 32:41, isn't each element in AB rows in A multiplying with columns in B? Waiting for your answer.
@srush_nlp 7 หลายเดือนก่อน
Yes this is a bug, sorry about that!
@AllNightLearner 8 หลายเดือนก่อน
was narration generated? I would love to use the same technique for narrating text.
@martiancoders1518 8 หลายเดือนก่อน
Well every output must be mathematical proven ingest so can we not build a formula for every pattern of output. Let's say it out human sense n grammar sense of each word constructed. While it construct can it not out how it did it
@Tony_Indiana 4 หลายเดือนก่อน
WOOHOO! just found this channel. it is almost better than porn. how do we give you our money so you keep making videos? pls tell us :o

ต่อไป

เล่นอัตโนมัติ

Speculations on Test-Time Scaling (o1)

Speculations on Test-Time Scaling (o1)

Do we need Attention? - Linear RNNs and State Space Models (SSMs) for NLP

Do we need Attention? - Linear RNNs and State Space Models (SSMs) for NLP

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Incredibox Sprunki vs Inside Out 2 - Which team will win? #shorts #animation

Incredibox Sprunki vs Inside Out 2 - Which team will win? #shorts #animation

แจ๊สสร้างตำนานอีกแล้ว 😂😂 #แจ๊สชวนชื่น #แจ๊สแจง #แจงปุณณาสา #ก็มาดิครับ #ตลก #shorts

แจ๊สสร้างตำนานอีกแล้ว 😂😂 #แจ๊สชวนชื่น #แจ๊สแจง #แจงปุณณาสา #ก็มาดิครับ #ตลก #shorts

🔴LIVE เชียร์สด : แมนเชสเตอร์ ยูไนเต็ด พบ เลสเตอร์ ซิตี้ | รุด ฟาน นิสเตลรอย คุมผีแดงนัดส่งท้าย MW11

🔴LIVE เชียร์สด : แมนเชสเตอร์ ยูไนเต็ด พบ เลสเตอร์ ซิตี้ | รุด ฟาน นิสเตลรอย คุมผีแดงนัดส่งท้าย MW11

How Much Tape To Stop A Lamborghini?

How Much Tape To Stop A Lamborghini?

The Quest To Make Unbreakable Glass

The Quest To Make Unbreakable Glass

Guest lecture @ DLSU Manila: Artisanal Filipino NLP Resources in the time of Large Language Models

Guest lecture @ DLSU Manila: Artisanal Filipino NLP Resources in the time of Large Language Models

“What's wrong with LLMs and what we should be building instead” - Tom Dietterich - #VSCF2023

“What's wrong with LLMs and what we should be building instead” - Tom Dietterich - #VSCF2023

SOSP '24 - Modular Verification of Secure and Leakage-Free Systems - Anish Athalye

SOSP '24 — Modular Verification of Secure and Leakage-Free Systems — Anish Athalye

What are AI Agents?

What are AI Agents?

Long-Context LLM Extension

Long-Context LLM Extension

Prof. Johannes Müller-Trede On the Psychology of Losses

Prof. Johannes Müller-Trede On the Psychology of Losses

Yann Dubois: Scalable Evaluation of Large Language Models

Yann Dubois: Scalable Evaluation of Large Language Models

Wolfram Physics Project: Working Session Wednesday, Apr. 29, 2020 [Finding Black Hole Structures]

Wolfram Physics Project: Working Session Wednesday, Apr. 29, 2020 [Finding Black Hole Structures]

Q Mobile SL100 Lite 3000Mah Battery Best Mobile Phone Review

Q Mobile SL100 Lite 3000Mah Battery Best Mobile Phone Review

ผมควรทำไงดีครับ🥲 #phutoza #iphone16pro

ผมควรทำไงดีครับ🥲 #phutoza #iphone16pro

How eStreetSmart are you? #shorts

How eStreetSmart are you? #shorts

Hardware Tools Recommendations High Performance Practical Hardware Tools

Hardware Tools Recommendations High Performance Practical Hardware Tools

with the power of denden it can turn into an iPhone #shorts

with the power of denden it can turn into an iPhone #shorts

iPhone 13 vs iPhone 12 ⚡ The Ultimate Speed Test Battle! Who’s the Real Speed King? 👑 #Viral #Shorts

iPhone 13 vs iPhone 12 ⚡ The Ultimate Speed Test Battle! Who’s the Real Speed King? 👑 #Viral #Shorts

Microsoft Notepad.exe is overpowered now… and 13 other major updates for devs

Microsoft Notepad.exe is overpowered now… and 13 other major updates for devs

ลูกค้าไม่อดทน 😭 #ซ่อมโทรศัพท์ #เต้าฮวย #funny

ลูกค้าไม่อดทน 😭 #ซ่อมโทรศัพท์ #เต้าฮวย #funny