How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

Robert Miles AI Safety

มุมมอง 173 130

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 12 ม.ค. 2025

ความคิดเห็น • 436

@qwertymann1 5 ปีที่แล้ว ⁺⁶⁶¹
Without knowing the amount of time spent on the animations, I'd say it was totally worth it!
@luksablp 5 ปีที่แล้ว ⁺²⁴
I think it really helped understanding the concepts
@thefakepie1126 4 ปีที่แล้ว ⁺⁴
what if it was 29 years and 3 months ?
@climagabriel131 3 ปีที่แล้ว
@@thefakepie1126 lol, this a reference to his age?))
@thefakepie1126 3 ปีที่แล้ว ⁺¹
@@climagabriel131 nah it's just a random number , it's just a just cuz the guy said "Without knowing the amount of time spent on the animations" so it could be anything even 29 years , and would it have been worth it then ? it's a stupid joke
@climagabriel131 3 ปีที่แล้ว
@@thefakepie1126 oh, alright)
@travcollier 5 ปีที่แล้ว ⁺³⁵⁸
"If you are, for example, an AGI..."
Nice job future proofing the video ;)
Seriously though, in retrospect, iterated distillation and amplification is obvious to the point of seeming trivial... which means you did an excellent job explaining it.
@monad_tcp 4 ปีที่แล้ว ⁺²⁹
I'm an AGI, it helped me.
@travcollier 4 ปีที่แล้ว ⁺¹⁹
@@monad_tcp I welcome our new robot overloads.
@shamsartem 5 ปีที่แล้ว ⁺¹⁹⁹
You distilled a hell of a lot of information in this 10 minute video. Spending so much time on the animations really was worth it I think
@MrBleulauneable 5 ปีที่แล้ว ⁺²⁸¹
Alright I'll watch it twice then ! (The animations are neat btw !)
@qzbnyv 5 ปีที่แล้ว ⁺¹⁴
Makes sense after seeing the Grant Sanderson credit for the animation code :) 3b;1b
@alekseysoldatenkov5675 5 ปีที่แล้ว ⁺²
NWN Oh shit! Keep the dope collabs going.
@rogerab1792 5 ปีที่แล้ว ⁺¹
This is the third time for me, or maybe the fourth 🤷I just remember the first and the second time. I created a two year dejavu to prove this reality is a simulation. If someone is interested about my theory reply to this message, I am too tired to explain now, I had to escape from the police last night and do all sorts of crazy things to repeat what I did two years ago. If someone else has experienced the dejavu they know for sure I am not joking. If you haven't experienced the same things twice, I can still convince you I am telling the truth because I've left material evidence about it. Reply to this message and I'll explain with more detail...
@YourMJK 5 ปีที่แล้ว ⁺¹
Yeah, you do notice it uses 3b1b's "Manim" Framework
@MrBleulauneable 5 ปีที่แล้ว ⁺²
@@rogerab1792 Chill my dude, the video was simply reposted because of a minor editing error. You may want to see a psychiatrist tho, you don't seem to be doing too good right now (if you have something like schyzophrenia or any paranoia inducing psychologic condition then you probably need medication).
@mattstuart-white450 5 ปีที่แล้ว ⁺³⁹⁵
"How to keep learning when you're better than any teacher" - Rob, you have really let the positive youtube comments go to your head... 🤔
@Gooberpatrol66 5 ปีที่แล้ว ⁺⁶⁸
Miles really wants to contain AI superintelligence because he doesn't want competition.
@JohnJones1987 5 ปีที่แล้ว ⁺⁶
Eventually we all end up roughly the same - except like Alpha Zero i started from nothing, so by a small margin I surpassed the limits of my competition.
@nephildevil 5 ปีที่แล้ว ⁺³
🤣🤣
@joshuacoppersmith 5 ปีที่แล้ว ⁺¹⁹
Animations at that level would cost a lot of time, but what you chose to create really "burned" the concepts into my visual memory, so thank you for the effort.
@KivySchool 5 ปีที่แล้ว ⁺¹²⁴
Excellent! High quality animations with high quality teacher. I'm so grateful for all the good content you have been posting here.
@ze4017 5 ปีที่แล้ว ⁺⁹
I'm at 5:51 rn so I haven't finished yet but OMLORDY this thing about having a quick solution vs a slow algorithm is actually how the human brain works. I'm studying cognitive neuroscience and software in Uni right now and that is so cool to see how the two overlap so naturally. Love it
@Jmoneysmoothboy 3 ปีที่แล้ว
It's not how my brain works because I'm retarded. Bet they didn't tell you that in your fancy brain class mr fancy man
@DeliciousNubbs 5 ปีที่แล้ว ⁺⁸⁷
Holy hell, this was awesome and very clear!
@ministerc9513 5 ปีที่แล้ว ⁺⁶
Roberts ability to clearly explain complicated things is itself an art form.
@mattf2219 5 ปีที่แล้ว ⁺²²
I love that this video got over one thousand likes before it got even one dislike, I cant help but admire the community fostered by this channel :)
@RyanTosh 4 ปีที่แล้ว ⁺⁴
The only dislikes are from AGIs who know we're onto them...
@REOsama 2 ปีที่แล้ว ⁺¹
This is pure gold, not only is it informative, but is explained in an excellent way
@NickCybert 5 ปีที่แล้ว ⁺⁴
The animations actually really helped make your explanation clear.
@friiq0 5 ปีที่แล้ว ⁺⁸
Huge step up in quality from an already phenomenal channel. By all means, take your time. The payoff is clear. Looking forward to more, Cheers!
@moneypowertron 5 ปีที่แล้ว ⁺⁷
Fantastically intuitive explanation, Robert. The animations were a crucial tool. Thank you for the efforts!
@spirit123459 5 ปีที่แล้ว ⁺²⁹
Great animations and explanation!
@pafnutiytheartist 5 ปีที่แล้ว ⁺⁴⁸
10:32 Have you tried using distillation on your animation procedure? I've heard it can approximate a long process into a fast and efficient one. Loved the video by the way, looking forward to the next part.
@matthewhubka6350 3 ปีที่แล้ว ⁺³
Distillation requires a lot of resources to get the good results. For 1 vid he’s better off just amplifying
@Ruptured_AU ปีที่แล้ว ⁺¹
Animations arw SO worth it thanks a lot.
@szymonbaranowski8184 ปีที่แล้ว
this explains not only how to become better it also informs you why majority will never become good because of not using or coming up with such tools...
@NeonStorm5 5 ปีที่แล้ว ⁺²
Probably the most intuitively informative video I've ever seen.
@polares8187 5 ปีที่แล้ว ⁺⁷
This was superb. Fantastic animations. Clear explanations. Awesome all around.
5 ปีที่แล้ว ⁺³
The quality of your videos have really improved. This was very well animated and explained. Thank you, please keep them coming.
@chriscanal999 5 ปีที่แล้ว ⁺⁴
Great video! I’m consistently impressed with how wonderfully distilled the information on your channel is. Thanks for all the hard work and interpretability :)
@keithklassen5320 5 ปีที่แล้ว
I liked the animations. I probably didn't consciously learn anything from them, but they held my itty-bitty internet-addled attention, thus keeping my eyes on the screen, so they were a part of the learning.
@thrallion 5 ปีที่แล้ว ⁺⁵
legit my favourite channel on youtube by far
@SJNaka101 5 ปีที่แล้ว
Hmmm I dunno if I can top this channel for you, but looking at your subs I would take a few wild shots in the dark... check out Chessnetwork, Summoning Salt, Numberphile and Computerphile, and What I Learned. I suspect you will greatly enjoy at least a couple of those
@thrallion 5 ปีที่แล้ว
@@SJNaka101 hey thanks, good guesses as i already watch all those except what I learned :) will look into it
@solemnwaltz 5 ปีที่แล้ว ⁺¹
The animations are great! I took mental notes specifically on how satisfying and descriptive they are.
Well worth the time, in my opinion. c:
@willd4686 3 ปีที่แล้ว
Animations were very helpful. I'm not sure how much work they were but I'm grateful that you did them.
@snfn7847 5 ปีที่แล้ว ⁺⁸
Good to see you're still alive
@kensmith5694 4 ปีที่แล้ว ⁺¹
I did a thing a little like this for a chess program but my main part was not the "best move finder". The main thing was the "dumb move remover". This was based on recording the game as the program played out a whole game against its self. When the one side lost, there would be a search back through the moves to find the greatest change in board "position". The move just before that was taken to be a bad move and was added to the list of dumb moves. Removing dumb moves quickly saves a lot of processing time. The board position evaluation was not as cheap as it would first appear because unlike is normal today that part was extremely non-linear.
@reidwallace4258 4 ปีที่แล้ว ⁺¹
This is giving me flash backs to the dune novels. Paul was just doing treesearch all along.
@lewisleslie2821 4 ปีที่แล้ว ⁺¹
Reid Wallace i read dune for the first time last month, that’s a great comparison
@peto348 5 ปีที่แล้ว ⁺¹
Very high quality video to teach general public something about distillation and amplification. Of course there have to be AI safety somewhere in this video, but I think this kind of video is also good for someone who is interested in AI in general.
@SHAD0W99V0RTEX 5 ปีที่แล้ว
To be honest, I expected a self-help video about autodidacts but I was pleasantly surprised anyways. Good stuff! This is very ingenious.
@Celastrous 5 ปีที่แล้ว ⁺³
Wow this was amazing. I loved the animations. The explanations were so clear
@Anymodal 5 ปีที่แล้ว ⁺³
Dear Rob. Ive learned so much from your videos. Top quality education
@Gloubichou 5 ปีที่แล้ว ⁺¹
Such a quality video! You must have put so much time into this! Thanks a lot Robert, you're the hero of all ML/AI enthuiasts :D
@HereWasDede 5 ปีที่แล้ว ⁺²
Those animations were AWESOME!! Thanks
@roberttomsiii3728 5 ปีที่แล้ว
Thank you for being MY amplified agent.
@JohnnyDoeDoeDoe 5 ปีที่แล้ว ⁺¹
Your absolute best video yet!
@stasisthebest 4 ปีที่แล้ว
Thank you. My deepest respect for visually sharring all of your knowledge. I am certain many people have become at least a slightly better of themselves because of you.
@GglSux 5 ปีที่แล้ว ⁺³
And I really want to thank You for continuing to produce and share Your fantastic content!!!
Unfotunately I'm not able to support You (or any other of the many fantastic crestors) so all I can do is to watch everything and express my great gratitude.
So a again, a thousand thanks !!!
Best regards.
@vshalts 5 ปีที่แล้ว ⁺¹
Amazing animation and the easiest intuitive explanation of the ideas from Reinforcement learning I have seen so far with a surprising connection with AI safety. It was cool! Thanks!
@mare4602 5 ปีที่แล้ว ⁺¹
im so happy you are back, high quality content as always.
@gloverelaxis 5 ปีที่แล้ว ⁺¹
Animations were worth it. They help immensely
@kanva4 4 ปีที่แล้ว ⁺³
This is underrated
@aronchai 5 ปีที่แล้ว
I've seen this concept floating around a lot, but didn't really understand it 'til now. Thanks!
@Cabothedog14 5 ปีที่แล้ว ⁺¹
I've been waiting for a new video!! Glad to see you're uploading again :)
@BuceGar 5 ปีที่แล้ว ⁺¹
Great video and explanation, doesn't address the fundamental problems we will invariably have with AGI, but shows some of the potential dangers.
@hacker6284 5 ปีที่แล้ว
Those animations were totally worth it! Really well done video
@reverse_engineered 4 ปีที่แล้ว ⁺¹
Great job on this video! Your explanations were quite easy to understand and I think the animations helped to explain it. I tend to find diagrams and animations easier to understand than listening to spoken words, so I appreciate the effort you put into those animations.
@jessty5179 5 ปีที่แล้ว ⁺²
Thank you for sharing Rob !
@kennynicoll6277 5 ปีที่แล้ว ⁺²⁵
This nicely mirrors Kahneman's description of system 1 and 2 in human decision making.
@danielcallegaribr 5 ปีที่แล้ว ⁺⁴
Kenny Nicoll hey, this is a great insight!
@briansmithbeta 5 ปีที่แล้ว ⁺¹
The animations really helped me understand some things that had been confusing for me! Thanks!
@JohnDlugosz ปีที่แล้ว ⁺²
I wonder if that's the principle behind what I heard about training a small model (fits on a PC) with the major LLMs (e.g. GTP-4) and it only took $600 in running costs to make the small model act very much like the big one.
@wassollderscheiss33 5 ปีที่แล้ว ⁺¹
If the amplification process leads to a system that solves a problem optimally that implies there to be an optimal solution. 1. An optimal solution for chess is a table of optimal moves given every possible board. Given the introductory premise, that would mean a system with the size of an optimally compressed version of that table could play chess optimally after infinite iterations of training. 2. However, an optimal solution to chess can be represented more efficiently than with the mentioned table (so I think). Maybe through some math or just by leaving out positions of the table that can never be reached using the table. Does that mean, the amplification process will produce an optimal chess solution even in a system with the size of the optimally compressed version of that reduced table?
@brunosonza787 5 ปีที่แล้ว ⁺³
Really excellent video, Robert!
I love your videos on computerphile and this one seems to be an even better version that those there, with a clear explanation and neat graphics.
Keep it up and Thank you very much!
@ArtinKavousi ปีที่แล้ว
you are wonderful Being! for what you doing ! so helpful in these time and age of probabilities!
@PflanzenChirurg 5 ปีที่แล้ว
Best TH-cam Video of the Month
@Raymaniak 5 ปีที่แล้ว ⁺¹
Your videos are approachable and fascinating. Keep up the good work, Rob! You're awesome.
@amargasaurus5337 5 ปีที่แล้ว
Those animations are great!
Be proud ♥
@nielsgroeneveld8 5 ปีที่แล้ว
Few lectures have been as unbelievably good as this one.
@Viniter 5 ปีที่แล้ว ⁺²
Those animations are really cool!
@lacielaplante5702 5 ปีที่แล้ว
Your explanation is absolutely outstanding.
@ADAMBLVCK 5 ปีที่แล้ว
This channel is gold, and so is the work you're putting in! Simply great!
@ivanshmarov2866 3 ปีที่แล้ว
This amplification and distillation process is more akin to how we, humans, do research. First, everyone has little understanding of the subject. Then we assemble and reason about it together, coming to a conclusion. This conclusion is distilled and distributed among everyone, resulting now in everyone having a complete understanding of the subject.
@StevenAkinyemi 5 ปีที่แล้ว ⁺¹
Can't wait for the next video! I'm not sure alignment can be maintained the more complex an agent becomes. There will always be abstraction difference between what we want it do and what it does to optimize itself. This means we have to always tune the alignment as the agent becomes more complex. There is perhaps a point where the agent's comprehension of the universe explodes beyond our grasp and we won't be able to align it at that point. In fact, we might have to restrict it's optimization process when we discover its intelligence is getting beyond our control.
These are just theories in my head.
@GuuraHeavenbound 5 ปีที่แล้ว ⁺¹
Wooo! Said Polat! I've been following Seed (their Webtoon narrating the birth of a super AI) since it got featured on the platform ^^ I'm watching this video kinda late, but I think it's neat "how small the world can be". Also, really informative and interesting video Robert! ...I'm totally not binge-ing all of your uploads. Nope, nuh-uh. ....promise :3
@briancox3922 4 ปีที่แล้ว
Wow, you really are good at explaining these subjects.
Thank you.
@lobrundell4264 5 ปีที่แล้ว ⁺¹
Ugh so worth the wait!
@randommm-light 5 ปีที่แล้ว
Very nice and understandable. Thx.
The limits of architecture in n-dimensions..
@hosmanadam 5 ปีที่แล้ว
Your videos are perfectly optimized to be easily processed by my learning function.
@8989youu 5 ปีที่แล้ว ⁺¹
Wow, very clear and to the point. I love it. Definetly worth sharing 😁
@Sharklops 5 ปีที่แล้ว ⁺¹⁰
This was fantastic! Very well done. Cheers!
@DeclanMBrennan 5 ปีที่แล้ว
Crystal clear explanation with no waffle. Thank you. The graphics are so useful, they need their own name. How about didactic visualizations? :-)
@Koffeinsuechtigi 5 ปีที่แล้ว ⁺¹
Thank you for your well crafted explanation!
@rogerab1792 5 ปีที่แล้ว ⁺²
Really well explained, thanks!
@TheNeilChatelain 5 ปีที่แล้ว
Production value has definitely improved considerably
@justdiegplus 6 หลายเดือนก่อน
Most important video on AI on the internet.
@5ty717 ปีที่แล้ว
Brilliantly explained
@jeanmichelsarr6040 5 ปีที่แล้ว
Great idea, concise, precise.
@DamianReloaded 5 ปีที่แล้ว ⁺¹¹
Worth watching a few times! ^_^
@serenityindeed 5 ปีที่แล้ว
Your animations were really good! Enjoyed the explanation as well.
@sky5d 5 ปีที่แล้ว
the animations really paid off.
@ulissemini5492 5 ปีที่แล้ว
awesome! this makes so much sense! this is exactly how i get better at chess, play a game quickly, then go back and calculate a lot to find the better moves, then improve my intuition!
its so awesome that you said it in such a way that now i feel like i can write a program to become superhuman at anything :D
@CyberAnalyzer 5 ปีที่แล้ว
Wow, fantastic animations! The content is so deep! I love it!
@dylancope 5 ปีที่แล้ว
The animations were great! Very intuitive video :)
@TheNoodlyAppendage 5 ปีที่แล้ว
5:00 That technique only works with arbitrary precision memory. With finite precision (like 100% of computers and even the human brain) there is a limit to the complexity of problems that can be solved this way. By reducing the number of plys the search function searches, the original search value function will miss non-linear branching possibilities. e.g. Say you have a search value function (SVF) that looks one move ahead (2 plies). And there exists a board state in which move A leads to a 51% chance of winning the game 3 moves later. But Move B leads to a 50% chance of winning. Except that the path that move A leads you down is highly dependant on the opponent's SVF which you cannot know prior to playing and move B leads to a board state with the option of Move C or D, D is always a loss and C is always a win, with no ability of the opponent to effect the outcome. A SVF that looked further than 1 move ahead would see this and naturally assign the higher 100% value to the original board state, btu the simpler one will not, because it by definition can only look one move ahead. not two. So it becomes an optimization problem. How much potential error can one tolerate in the system to gain the increased performance of the simpler faster SVF.
@YouAreLoved321 5 ปีที่แล้ว ⁺⁷
rob miles new video boys get the popcorn!
@Signonthisline 5 ปีที่แล้ว ⁺¹
I don't bookmark videos very often. GJ. also I subscribed (more common)
@Hexanitrobenzene 5 ปีที่แล้ว
Yay !
We missed you, Rob :)
@RagingPanic 5 ปีที่แล้ว
How can we know for certain that alignment is transitive? If an AGI is made to uphold and strive for certain principles like health, well-being, safety, risk-aversion, transparency, etc, how can we know that it will not take it's interpretation of one or more of those 'principles' to the extreme? An AGI concerned with the safety of a certain task might deem the task too dangerous to be done at all, but as people we know that task must be done. Even if we have an AGI aligned with us to start with, I'm not convinced that once it starts optimizing the things (both humans and the AGI care about) it will perfectly inherit and preserve its ideal alignment all the way through.
Great video as usual, keep it up!
@xystem4701 4 ปีที่แล้ว ⁺¹
And here I was thinking this was just going to be a simple minimax video!
@World_Theory 5 ปีที่แล้ว ⁺¹
8:55 This reminds me of the first principle technique for creating a flat surface from rock that you find in the wilderness… I think it goes like this: Take three pieces of rock-A, B, and C-that are the flattest you can find. Grind the flat face of A against the flat face of B for a while. Then switch to A and C. And again for B and C. And keep switching the pairings, and grinding, until you have a flat surface.
Which kinda makes me wonder if Generative Adversarial Networks would be better off with three Artificial Neural Networks, instead of just two.
@World_Theory 5 ปีที่แล้ว
Actually… Maybe a pair of ANNs in a GAN set up, should be thought of as one unit, unless an ANN in that situation can have its role switched from Student to Teacher and back again. Because it might be a bit slower to learn lessons secondhand, as they'll be watered down. (That's what my intuition says, anyway.) So if you treat a student and teacher as a set, you might need more than one set to learn faster. Hmmm… But running so many ANNs on your processing equipment will be exactly as expensive as running that number of ANNs on your processing equipment. So you might as well give your two or three ANNs more processor cycles, instead of running many GANs. I'm not sure if this can be answered with what we know about theory, or if it needs testing to answer.
@jameslincs ปีที่แล้ว
This video deserves more views
@Gorabora 5 ปีที่แล้ว
Awesome video and very easy to understand, keep up the good work !
@jonathanquarles3708 5 ปีที่แล้ว
You explained this so clearly, thank you!
@dylancope 5 ปีที่แล้ว ⁺¹
How did I miss this?! I can't believe I hadn't "hit the bell" on this channel yet.
@ChazAllenUK 5 ปีที่แล้ว
If there's a tiny flaw in your evaluation function, this process will amplify and reinforce that flaw. In essence, it's a sterile learning process; absent real-world feedback (I think you mentioned this right at the end). It seems like a very plausible path to an AI that doesn't align with "our" values.
@stopaskingmetousemyrealnam3810 5 ปีที่แล้ว
The point at 5:05 ish about comparing the results of the rough guess and the detailed examination is very good, thank you. Extending it, we can also compare multiple different ways of correcting intuition and see which is more powerful.
I am wondering if it's also possible to learn from stupider versions of oneself that have specialized in some particular gimmick. Clearly there should be information to be exploited there, but it's not clear how to get there.
@BenKarcher 4 ปีที่แล้ว
A minor thing to point out is that if as you said in the video your neural net is predicting how likely a good player would be to make a certain move, it is not a good way to evaluate objective move quality, only relative move quality. For example, you could imagine a scenario in chess where you have almost lost and are in check and there is only one way to even get out of check. This move would score very highly on your neural net while still being a bad board state. While this type of relative evaluation function is sufficient to play the game it is not sufficient to evaluate a given board state as you said at 2:58.
@michalchik 4 ปีที่แล้ว ⁺¹
I could see this process is being really useful but also often falling into a kind of solipsistic trap where the original defective evaluation function is used to determine the ultimate outcome of the tree, but it determines it suboptimal e and with a bias towards certain types of configurations which in turn are used to feedback on the original function to make those configurations more likely to come out and essentially you get caught in a local Maxima that may or may not have anything to do with ultimately winning the game or optimally hitting your goal.

ต่อไป

เล่นอัตโนมัติ

Safe Exploration: Concrete Problems in AI Safety Part 6