Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Grant Sanderson

มุมมอง 43 158

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 พ.ย. 2024

ความคิดเห็น •

@rustyroche1921 วันที่ผ่านมา ⁺²⁵⁰
woah grant getting the gains
@hyperadapted วันที่ผ่านมา ⁺²⁹
high dimensional vascularity
@kellymoses8566 วันที่ผ่านมา ⁺⁸
I know. Now he is as hot as he is smart!
@poke0003 วันที่ผ่านมา ⁺¹⁰
-"Swol is the goal, size is the prize!" - 3B1B Loss Function, probably
@hyperadapted วันที่ผ่านมา ⁺³
@@poke0003Ah I see you are a man of culture as well. Glad to see other Robert Frank Connoisseurs :)
@sho3bum 22 ชั่วโมงที่ผ่านมา ⁺⁶
3Curls1Extension Grant Sanderson
@PatrickMetzdorf 13 ชั่วโมงที่ผ่านมา ⁺⁴
That was easily the best explanation I have ever seen. Way to decrypt some of the most magical-seeming mechanisms of the transformer architecture. Thanks a lot for this!
@krishdesai9776 วันที่ผ่านมา ⁺⁷³
Someone's been working out!
@F30-Jet วันที่ผ่านมา ⁺¹
AI generated😂
@magnetsec วันที่ผ่านมา ⁺⁷⁰
Grant should team up with Andrej Karpathy. They'd make the best Deep Learning education platform
@nbme-answers วันที่ผ่านมา ⁺¹²
They already do make the best deep learning education platform
@magnetsec 23 ชั่วโมงที่ผ่านมา
@@nbme-answers Yeah but separately
@tescOne 13 ชั่วโมงที่ผ่านมา
Two of the most talented educators on yt. Their two series on neural nets are basically anything a curious person needs to start building their own models. Grant gives you the big picture with immense sensibility and insane visualization. Andrej gives you all the technical details in reasoning, implementation and advanced optimization, with an empathy for your ignorance comparable to Feynman's haha.
@aricoleman5802 9 ชั่วโมงที่ผ่านมา
@@nbme-answerswha is it?
@omarnomad วันที่ผ่านมา ⁺³⁶
38:30 The only reason we use tokenization is due to limited computational resources, *but* not for meaning. We gain efficiency improvements of about ~400% when using BPE for the same budget (1 token ≈ 4 characters).
@onicarpeso วันที่ผ่านมา ⁺²¹
I finally see the human behind the great videos I watch!
@mpperfidy 9 ชั่วโมงที่ผ่านมา ⁺¹
Another in a long, long line of excellent educational presentations. If you didn't exist, we'd have to invent you, which would be quite hard. So I'm glad you already exist.
@egoworks5611 17 ชั่วโมงที่ผ่านมา
Such a great way to learn and undertand the intuition behind this work. I sometimes think about the people that started these sorts of works and all the groups of people that thought about the possibility of encoding language and mathematically express it. Comes out that even once you understand this conceptsit is still an outstanding effort and the ideas behind are superb.
Crazy to think that some people thought about this, had the ambition and actually expected to build a tool. Once you understand it and it is well explained, yes, it might look as not impossible, but you still can see how groundbreaking it was.
Thanks Grant for taking the time to share this
@abhidon0 วันที่ผ่านมา ⁺²⁶
I guess the main question here is "Is Grant Natty?"
@learnbydoingwithsteven วันที่ผ่านมา ⁺⁷
Grant is in great shape.
@Kvil วันที่ผ่านมา ⁺⁷³
he should be steve in minecraft movie
@0fpm531 วันที่ผ่านมา
real
@kellymoses8566 วันที่ผ่านมา
Can't be worst than Jack Black
@souvikbhattacharyya2480 4 ชั่วโมงที่ผ่านมา
I wouldn't mind "giving a talk" type videos like this from Grant every now and then. I think I would actually prefer this style over the regular one.
@murmeldin วันที่ผ่านมา ⁺⁸
Just came here from the LLMs for beginners video. Loved the talk, very informative. Keep the great work up, man 👏🏼
@mahdisaberi3057 วันที่ผ่านมา
me too 🙌😊
@tomasg8604 4 ชั่วโมงที่ผ่านมา ⁺¹
30 to 50% of the brain cortex neurons are devoted to vision or sight, as compared to 8 percent for touch and just 3 percent for hearing.
That means learning how to see or look and process visual information is at the center of human intelligence.
@pufthemajicdragon วันที่ผ่านมา ⁺¹
That question at the 54 minute mark about analog computing making LLMs more efficient - yes. There are a LOT of smart people experts in the field who are working on exactly that. Maybe a next direction for your continued learning?
@rorolonglegs4594 วันที่ผ่านมา ⁺⁴
Great addition to your pre-existing series!
@__m__e__ วันที่ผ่านมา ⁺¹
Great talk! Bad questions.
@undisclosedmusic4969 วันที่ผ่านมา ⁺²
My left ear thanks you
@noorghamm3449 17 ชั่วโมงที่ผ่านมา
Thank you❤️
@AnnaSayenko-f6s 22 ชั่วโมงที่ผ่านมา
Thanks for the breakdown! A bit off-topic, but I wanted to ask: I have a SafePal wallet with USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?
@jordantylerflores วันที่ผ่านมา ⁺⁴
As someone who is in the, "wishes he took math more serious" camp; I wish we were given more, ANY, cool examples of what was possible with applied math. Growing up in rural Ohio, the only things that math was pushed for was business/finance and maybe some CS stuff however, it was always abstract Here are some concepts, learn them for the test. Like how many cool things can be done inside of 3D programs such as Blender with just an above-level understanding of geometry.
I acknowledge my failings in this too, as I did not seek these things out while I was in school. I also might have some age related FOMO lol. Since the things I enjoy doing now, VFX/Blender/CGI, are all things based on concepts I am having to teach/re-learn on my own, as a man who is almost 40.
Thank you for this, and it is going to take a couple watches for it to sink in haha.
@kellymoses8566 วันที่ผ่านมา ⁺¹
I agree. Kids would put a lot more effort to learn math if they were shown how incredibly useful math is in real life. Being really good at math is like having a superpower compared to people who are not.
@cat_copilot 20 ชั่วโมงที่ผ่านมา
Good job 😃
@Izumichan-nw1zo 11 ชั่วโมงที่ผ่านมา
Please collaborate with Andrej karpathy and make a huge deep learning platform or at least explain stuff in this format regularly as we not need animations every time ppt or chalk and talk is also fine sir !
@zamplify วันที่ผ่านมา ⁺⁴¹
3 blew one blown
@m41437 วันที่ผ่านมา ⁺⁷
I really hope this has something to do with the video
@sblowes วันที่ผ่านมา ⁺²
That’s clever
@jaydeep-p วันที่ผ่านมา ⁺¹
Nice try Diddy
@fakegandhi5577 วันที่ผ่านมา ⁺³
Oh my god. This is incredible. You're a genius!
@jhdsipoulrtv170 วันที่ผ่านมา ⁺³
This is truly one of the most clever things I have seen a long time
@Loveforcricket99 14 ชั่วโมงที่ผ่านมา
For a word like ‘bank’ which can have different meanings for different contexts, does the LLM store it as a single vector or it can store multiple vectors for each known variations of the word?
@GrantSanderson 12 ชั่วโมงที่ผ่านมา ⁺¹
It’s initially embedded as one vector, but one big point of the attention layer is to allow for context-based updates to that vector
@JuliusUnique วันที่ผ่านมา
which word/token is in the middle at 0 0 0 0 0 ... for example for chat gpt 4?
@literailly 18 ชั่วโมงที่ผ่านมา
@39:00, Why not make tokens full words?
(time to read up on byte-pair encoding!)
@rifatmithun8948 6 ชั่วโมงที่ผ่านมา
Your voice seems very familiar. It took me 10 seconds to realize you are the 3b1b.
@eugenedsky3264 วันที่ผ่านมา ⁺⁵
Grant! We now know what LLMs are, but what about LMMs - Learning Mealy Machines (named so by me)?
A learning Mealy machine is a finite automaton in which training data stream is remembered by constructing disjunctive normal forms of the output function of the automaton and the transition function between its states. Then those functions are optimized (compressed with losses by logic transformations like De Morgan's Laws, arithmetic rules, instruction loop rolling/unrolling, etc.) into some generalized forms. That introduces random hypotheses into the automaton's functions, so it can be used in inference. The optimizer for automaton's functions may be another AI agent (even Neural Nets), or any heuristic algorithm, which you like.
Machine instructions would be used to calculate the output function and the transition function of the automaton. At first, as the automaton tries some action and receives a reaction, corresponding terms of those functions are constructed in plain "mov"s and "cmp"s with "jmp"s (suppose x86 ISA here). Then machine instructions of all actions-reactions are optimized by arithmetic rules, loop rolling and unrolling, etc, so the size of the program is reduced. That optimization may include some hypotheses about "Don't Care" values of the functions too, which will be corrected in future passes, if they turn out to be wrong...
Imagine that code running on something like Thomas Sohmers' Neo processor, or Sunway SW26010, or Graphcore Colossus MK2 GC200.
One kind of transformation they seem often forget is "a loop rolling" (not just un-rolling). I.e. making an instruction loop ("for x in range a..b" statement) from a long repetitive sequence of instructions.
...Kudos for Bodybuilding!
@AzharAli-n5c วันที่ผ่านมา
great
@rohan_gupta วันที่ผ่านมา ⁺³
So good
@ashukun วันที่ผ่านมา
let's go
@debyton 23 ชั่วโมงที่ผ่านมา ⁺¹
Choosing the next word, by any name, is thinking.
20 ชั่วโมงที่ผ่านมา ⁺¹
Agreed. Except here we are not talking about "choosing". We are talking about "calculating the probability that a specific word belongs there". An this is (mainly) math.
@vit3060 วันที่ผ่านมา ⁺¹
It could be nice to see more about KAN approach which is very promising.
@PaperTigerLive วันที่ผ่านมา ⁺⁴
nooo you were in munich and didn't tell us :((((
@no1science วันที่ผ่านมา ⁺²
amazing
@oncedidactic 8 ชั่วโมงที่ผ่านมา
Another roof video!? Oh…
@salchipapa5843 วันที่ผ่านมา ⁺¹
I graduated with a degree in electrical engineering back in '07. I did not understand most of anything that was talked about in this video.
@BenjaHernandezMemm วันที่ผ่านมา
im really proud of being alive at the same time as you
@DakshPuniadpga วันที่ผ่านมา ⁺⁶
Great Speech
@raideno56 วันที่ผ่านมา ⁺⁶
Video came out like 10 minutes ago and it is 50 mins long
@volodyadykun6490 วันที่ผ่านมา
@@raideno56 That's what's so great about it, very big
@jordantylerflores วันที่ผ่านมา
@@raideno56 watched it on 5x speed lol
@erwinschulhoff4464 วันที่ผ่านมา ⁺¹
@@jordantylerflores did you have subway surfers on the side as well?
@jasonandrewismail2029 วันที่ผ่านมา ⁺¹
GRANT IS IT NOT BASICALLY DOE IN STATISTICS. KIND REGARDS JASON
@Trtko-y2p วันที่ผ่านมา
you're smiling like you're microdosing LSD or something
@johnchessant3012 วันที่ผ่านมา ⁺²
hi
@geekyprogrammer4831 วันที่ผ่านมา ⁺²
Second!
@seatyourself7082 วันที่ผ่านมา ⁺¹
First! (of commenting after watching the whole thing)
@volodyadykun6490 วันที่ผ่านมา ⁺¹
Why would he explain cartoon to them?
@dadsonworldwide3238 วันที่ผ่านมา
Great stuff,
Yet a generalized go/nogo theory or reference in space doesn't undoubtedly build an assimilated seed of deterministic responsibility for our mixed multitude to simulate strong indentefiers and compute the modern world that would be a sir on the opposite side of the eqauvalance principle to einstein lol
Great thinker in renormalization overly extended and everyone is ready for over delayed era of optimization. We got nuked and detoured this quest but its great to be back on oar with goals of multiple genrations that was so rudely interrupted by the world
@MichealScott24 วันที่ผ่านมา ⁺¹
❤🫡
@oraz. วันที่ผ่านมา
Why do people make that mouth smacking sound whenever they start a sentence

ต่อไป

เล่นอัตโนมัติ

Creating Your Own Programming Language - Computerphile