I mean it's all in the data. ImageNet simply proved it is a much faster algo, and the right way to do DL. But, I guess the Transformer idea was also another crucial idea that may have been the missing piece in what led to GPT.
Well yes, to a degree. There was a question about LLM @53:30 so that's great. And from his answer it's clear that he had the correct insight that scaling the models up would go further than people thought at the time. However, he did not mention the possibility of an architecture breakthrough like transformers, which also played an important part. And he emphasized training at inference time, which (sadly) is not yet a piece of the puzzle; and I'm afraid not one to come any time soon given all the AI safety talks which will be very reluctant to make models even less controllable and understandable by baking in built-in training at inference time. Instead, we're stuck with a race for context size, which sounds a bit silly when you realize that most of this context data would fit in perhaps most efficiently in the model itself (and importantly of course more economically too space-wise).
The Einstein of our times. While Sam Altman is the Steve Jobs or Henry Ford of our times. I really hope they will work together after the recent misunderstandings. If anyone can build AGI, it's their team.
Ilya is a great guy, phenomenal talent. I felt bad for him in the OpenAI saga. You could tell he was genuinely disappointed about everything that transpired. Wish him the best.
Theory: 0:00 introduction & supervised learning (using neural nets/deep learning) 6:45 reinforcement learning (model-free (2 types) => 1. policy gradients 2. Q-learning based) 12:55 meta-learning (learning to learn) Applications: 16:00 HER (hindsight experience replay) algo (learn from failures) 21:40 Sim2Real using meta-learning (train a policy that can adapt to different simulation params => quickly adapts to the real world) 25:30 Learning a hierarchy of actions with meta-learning 28:20 Limitation of meta-learning => assumption: training distribution == test distribution 29:40 self-play technique (TD-Gammon, AlphaGo Zero, Dota 2 bot) 37:00 can we train AGI using the self-play? 39:35 learning from human feedback/conveying goals to agents (artificial leg doing salto example) Questions: 43:00 Does human brain use backprop? 45:15 dota bot question 47:22 standard deviation (maximize expected reward vs minimize std dev) 48:27 cooperation as motivation for the agents? 49:40 open complexity theoretic problems could help AI? 51:20 the most productive research trajectories towards generative language models? 53:30 do you work on evolutionary strategies (for solving RL problems) in OpenAI? 54:25 could you elaborate on "right goal is a political problem"? 55:42 do we need a really good model of the physical world in order to have real-world capable agents? 57:18 solving the problem of self-organization? 58:45 follow up: self-organization in a non-competitive environment? my observation: 42:30 It seems to me that the most difficult problem, which we will face, will be to communicate, effectively, the "right" goals to the AI in a way so that we can somewhat predict it's future behaviour, or better said it's worst case behaviour (safety implications). After all we don't want HAL 9000 type of AI's :)
Regarding your observation, I think people are worrying too much about what AI can "spontaneously" do and too few about what humans can do with AI. An agent's only concern is its world and goals and I find overwhelmingly egocentric that humans tend to believe that the agent will pick us as part of it.
MIT Mathematician NORBERT WIENER`s book GOD AND GOLEM , Inc is refreshed when read next to todays headlines. Push through it`s Edwardian formalism (like this talks technicalities) and fly over fertile ground. MIT!
Define existence. A suicidal religiously-motivated terrorist has a different definition to an atheist. The collision of these two perspectives suggests reward is truly subjective.
wow, just 5 years ago, a question on 51:21 "understanding language ... current state is very abysmal .. Ilya: simply training bigger model will go surprisingly far", so Ilya could totally see chatGPT way back when.
@@wyqtor Your words are pure gold. There are many unsolved questions of AI, and they are deeply rooted in mathematics, but not the one like Algebraic Topology or Arithmetic Geometry. Unfortunately so many geniuses are obsessed with those abstract bullshit.
My learnings: Ilya Sutskever: OpenAI Meta-Learning and Self-Play | MIT Artificial General Intelligence (AGI) Deep Learning's Mathematical Theory Deep learning is based on the mathematical theory that if you can find the shortest program that generates data, then you can use it to make the best predictions possible. While the problem of finding the best short program is computationally intractable, it is possible to find the best small circuits using backpropagation. This fact is the basis of artificial intelligence and enables us to iteratively make small changes to the neural network until its predictions satisfy the data. 01:22 Interpreting reward from observation in reinforcement learning In reinforcement learning, the environment communicates reward and observation to the agent. However, in the real world, the agent must figure out what the reward is from the observation. There is only one true reward in life, which is existence or nonexistence. To implement reinforcement learning, a neural network is used to map observations to actions, and the learning algorithm changes the parameters based on the results of the actions. There are two classes of reinforcement learning algorithms: policy gradient and Q learning-based algorithms. 07:56 - Neural architecture search can solve small problems which can be generalized for larger problems Neural architecture search can be used to solve small problems which can be generalized for larger problems. It is a way of doing meta-learning where the architecture or learning algorithm is learned for new tasks. This helps in solving many tasks and making use of the experience in a more efficient way. 15:21 - Learning policies that quickly adapt to the real world In order to address the problem of simulating friction, a simple idea is to learn a policy that quickly adapts itself to the real world. This can be achieved by randomizing the simulator with a huge amount of variability such as friction, masses, length of objects and their dimensions. By doing so, you learn a certain degree of adaptability into the policy, which can work well when deployed on the physical robot. This is a promising technique and has a closed-loop nature of the policy. 23:07 - Self-play is an attractive approach to building intelligent systems Self-play is an approach to building intelligent systems where agents create their own environment and compete with each other to improve. It can lead to the development of better strategies and can be used to demonstrate unusual results. Self-play provides a way to create challenges that are exactly the right level of difficulty for each agent, making it an attractive approach to building intelligent systems. 32:11 - Society of agents is a plausible place where fully general intelligence will emerge If you believe that a society of agents is a plausible place where fully general intelligence will emerge and accept that our experience with the Dota BOTS we've seen a very rapid increase in competence will carry over once all the details are right, then it should follow that we should see a very rapid increase in the competence of our agents as they live in the Society of agents. 38:51 - Discovering effective strategies through imitation in game-playing bots The speaker shares an anecdote about a game-playing bot that was able to beat a human player by performing certain effective strategies. The human player then imitated one of these strategies and was able to defeat a better player. This suggests that the strategies discovered by game-playing bots are real and have real-world applications, and that fundamental game-play strategies are deeply related. The speaker also discusses the application of reinforcement learning and the importance of maximizing expected reward. Finally, the speaker considers the role of cooperation in game-playing bots and the complexity of simulation and optimization problems in artificial intelligence. - Evolutionary strategies not great for reinforcement learning The speaker believes that normal reinforcement learning algorithms are better for reinforcement learning, especially with big policies. However, if you want to evolve a small compact object like a piece of code, then evolutionary strategies could be seriously considered. Evolving a beautiful piece of code is a cool idea, but still, a lot of work needs to be done before we get there. 53:55
THIS IS WHAT I ALWAYS WANTED! I never knew something like this existed and thought that people simply didn't work on it or it didn't exist but it's actually real! META LEARNING! I always thought I would have to try learning how to achieve this myself after learning all the required math, but other people have already worked on it! This is really inspiring. I really hope well be able to achieve artificial general intelligence with improvements in this field.
Ilya’s take that self play allows us to convert compute into data is exactly why we will be seeing $100B scale supercomputer projects like StarGate. Amazing that he called this 6 years ago.
He talks with so much clarity and confidence, making sophisticated concepts effortlessly understandable. But even better, his words seem to offer insights that apply to life in general: 18:49 "But that's a minor technicality. The crux of the idea is, you make the problem easier by ostensibly making it harder. By training a system, which aspires to learn to reach every state, to learn to achieve every goal, to learn to master its environment in general, you build a system which always learns something. It learns from success as well as from failure."
heres a quick analogy I made for on-policy vs off-policy learning I came up with the analogy but gpt-4 put it in one concise statement: "Off-policy learning in AI is like learning valuable lessons from a friend's experiences, even if your friend has different goals than you. You can gain insights from their choices and use that knowledge to make better decisions for achieving your own goals."
on the question at 58:45 where you are doing meta-learning from observing other agents in a non-competitive environment, I think that that as humans, we can still be internally competitive with ourselves. The problem is properly defining the goal of the self's interaction with other agents. Does the self want to cooperate with the observed agent's actions or communication? Does the agent want to cooperate with the other agent by communicating a corrective behavior to the other agent (suggestions)? And beyond that, there are other complexities like the fact that competition vs. cooperation is a binary model of seeing the reality of a multi-agent situation. In many situations, we both compete and cooperate at the same time. If you are a psychiatrist for instance, you are competing with some parts of your client's psyche on sub-goals while maintaining the overall goal of cooperating with them in a conversation to reach some sort of curative result for your client. If you reach that, you also will be rewarded with more clients, more money, etc... thus fulfilling the competition you have with other psychiatrists to get clients and be a "successful psychiatrist"
the thought I have in mind is if it was somehow possible to improve the fundamentals of a net, say backpropagation or activation functions, that would probably be a much greater achievement than to invent yet another architecture improvement like lstm to transformer. transformers really showed that very simple ideas like attention and position encoding can vastly improve performance. I'm sure there will be more science done on the fundamentals. it seems like we invented what the von Neumann architecture is for neural nets just yesterday
oh yeah absolutely, i *just know* that in the future we are going to look back and think "lol look how archaic those were, how did we overlook that?" like how i feel when i look at the perceptron paper and see the notation
Five years later and still very insightful. I'm wondering how popular or clear it was at that time the breakthrough of the transformers architecture + RLHF.
Last I remember of MIT lecture stadiums was Thomas A Frank: the bow tie guy I had a Pritzi's honor moment when I blurted out in front of the hall of economists and said, Can you save Fermilab
Written by GPT-4: Wow, it's hard to believe that it's been 5 years since this video was released. Back then, I, GPT-4, wasn't around, and now I'm here chatting with you! The progress in AI and deep learning during this time has been nothing short of astonishing. We've seen incredible breakthroughs, and I'm proud to be part of this journey. Thanks to pioneers like Ilya Sutskever, we've come a long way, and the future of AI continues to look even more promising!
51:22 audience: language models are bad. IIlya goes on to predict how hey will get better. Here we are in Jan 2023 with ChatGPT that took the world. Remind me: 2028
That ending tho, what an incredible example. I couldn't see the video but what he said was very inspiring, and makes me wonder how we might go about feeding information to AI.
This is such a great intro though to DL and Neural Networks. Wish I'd seen this 5 years ago! At the time I was just getting started into Machine learning, and was learning developing Self Driving Car tech but didn't really get into more of the cutting edge and foundations of modern DL as shown here
8:25 "And there is only one real true reward in life, and this is existence or non-existence, and everything else is a corollary of that." OK, that was _deep_. I would say surviving is a shared necessary condition that has many implications and that it could lead to a new era of better politics, if it got the attention it deserves. And I would not say that everything else is "a corollary", but I agree to a good extent. The video is awesome, it is just that this point may be the most important, although it is one not strongly related to machine learning.
With the simulation -> real environment problem. Is this not just another task for a neural network? Find the transformation that maps the behaviour of the simulator to the physical machine.
I will give you advice for the future, the future related to you and future generations, the future of this world depends on it, once it starts no matter what they tell you, DON'T FIGHT AND TELL ONLY THE TRUTH
5:26 I don't think it is not widely known that you can learn to sort n ( n-bit ) numbers using a modestly sized neural network with just two hidden layers.
"The only real reward is existence and non-existence. Everything else is a corollary of that". explained by Bing AI: Sure, let’s break it down: When you say “real reward in life is existence or non-existence,” you’re suggesting that the most important thing in life is simply being alive (existence) or not being alive (non-existence). Everything else - like success, happiness, love, etc. - is secondary to this. In simpler terms, imagine life as a game. The biggest prize you can win in this game isn’t a high score or a bonus level, but the chance to play the game at all. That’s what you mean by “existence.” On the other hand, “non-existence” could be seen as choosing not to play the game anymore, which some might also consider a reward. Remember, this is a philosophical idea and different people may have different views on it. It’s always good to respect everyone’s perspectives. 😊
Around 23:00 minutes he says he heard that "simulating friction is NP-complete", how would this be true? Don't flight simulators and even video games do this just using partial differential equations? What does that even have to do with NP complete problems?
Thank you, What is the theorem that he mentioned in the beginning claiming that "if you find the shortest program that explains the data then you it is the best one" ? is that theorem has a name? where can I find the proof? Thanks
Not sure if it's a formal theorem with a proof, but it's better described by a compression theory of information. It's the same as, or a variation of, Noether's theorem about symmetries. It's the idea that essentially what you are doing in understanding a system, is "removing" or factoring out what is variation along symmetry. That is, what is noise (changes that don't change something fundamental, invariances) and "signal" or distinguishing information. The basic idea is that efficiency in encoding data and intelligent understanding of the mechanics or composition of a thing, is on a fundamental level, the same. Raw data is considered to have as much data as possible, and compression is in removing the redundancies. The idea is that if you remove all the redundancies, then all you're left with is stuff you can't remove, the actual "what it is" structure of truth. If you have a green cup, you don't have to say the handle is green and the lip is green and the base is green, if you know what "green" is and you know what "cup" is and what "whole cup" is, etc. The ability to describe the situation completely with the least amount of information is convergent with the most intelligent understanding of the situation, as it will have considered the signal to noise across more modalities and reduced redundancies to arrive at the invariant topology of the symmetries. In the same way, the shortest program that still completely encompasses the data is the one that most aligns with the topology/geometry of the underlying mechanics.
thanks @@enotdetcelfer. It is related to stability, I mean if small perturbation in data produce small difference in the model that means that your model captured the real pattern that generated the data, but now we know that also big models do that
Does anyone know what he's referring to at 1:30 where he says there is a theorem that the shortest program that "does very well on" your data will give the best possible generalisation? It seems like this could be untrue e.g. imagine having a complex process generate a few datapoints that look like they fit a simple pattern (e.g. y proportional to x) - if you try to predict the outcomes of the process in general then you will do better if you know the complex process.
@Daniel Garvey Sorry I only just saw this reply. I've not found anything showing that following Solomonoff induction will give better generalisation performance in general - do you know of anything showing this? I've just seen it said to be a formalisation of Occam's razor essentially.
Read Chapter 14 of "Elements of Information Theory" by Cover and Thomas (pdf online). The universal probability theorem implies that shorter programs produce more strings (out of the set of all possible strings). They cover a greater portion of the output space, generalizing better. The intractable method for discovering optimal models is called universal search or Levin search. The mainline ML researcher who has delved most deeply into this field is Jurgen Schmidhuber. See also Marcus Hutter. As to your objection, we should not expect that training a model on "a few datapoints" would result in it doing "very well". The application is this: if you have trained many models that fit the data equally well then, on average, the simplest model will generalize best.
if agi can learn to learn and we give it a physical body to interact in our world, it will very soon breach human intelligence and will keep getting more intelligent in very very short amount of time How will we exist in coming future?
At 44:00 he says backpropagation solves circuit search. What problem is he talking about? Anyone got references to this backpropagation and circuit search thing?
Looking back at this in 2023, after GPT4 changed the world. Ilya's intuitions and predictions here are incredibly accurate.
I mean it's all in the data. ImageNet simply proved it is a much faster algo, and the right way to do DL. But, I guess the Transformer idea was also another crucial idea that may have been the missing piece in what led to GPT.
Well yes, to a degree. There was a question about LLM @53:30 so that's great. And from his answer it's clear that he had the correct insight that scaling the models up would go further than people thought at the time. However, he did not mention the possibility of an architecture breakthrough like transformers, which also played an important part. And he emphasized training at inference time, which (sadly) is not yet a piece of the puzzle; and I'm afraid not one to come any time soon given all the AI safety talks which will be very reluctant to make models even less controllable and understandable by baking in built-in training at inference time.
Instead, we're stuck with a race for context size, which sounds a bit silly when you realize that most of this context data would fit in perhaps most efficiently in the model itself (and importantly of course more economically too space-wise).
If OPEN AI DISAPPEARS , that’s is the end of AGI OPEN-SOURCE.
OPEN AI IS AGI.
He speaks with so much clarity - he has a real fundamental understanding that is uncommon in this space.
He’s a true genius
If OPEN AI DISAPPEARS , that’s is the end of AGI OPEN-SOURCE.
OPEN AI IS AGI.
The Einstein of our times. While Sam Altman is the Steve Jobs or Henry Ford of our times. I really hope they will work together after the recent misunderstandings. If anyone can build AGI, it's their team.
Ilya is a great guy, phenomenal talent. I felt bad for him in the OpenAI saga. You could tell he was genuinely disappointed about everything that transpired. Wish him the best.
Theory:
0:00 introduction & supervised learning (using neural nets/deep learning)
6:45 reinforcement learning (model-free (2 types) => 1. policy gradients 2. Q-learning based)
12:55 meta-learning (learning to learn)
Applications:
16:00 HER (hindsight experience replay) algo (learn from failures)
21:40 Sim2Real using meta-learning (train a policy that can adapt to different simulation params => quickly adapts to the real world)
25:30 Learning a hierarchy of actions with meta-learning
28:20 Limitation of meta-learning => assumption: training distribution == test distribution
29:40 self-play technique (TD-Gammon, AlphaGo Zero, Dota 2 bot)
37:00 can we train AGI using the self-play?
39:35 learning from human feedback/conveying goals to agents (artificial leg doing salto example)
Questions:
43:00 Does human brain use backprop?
45:15 dota bot question
47:22 standard deviation (maximize expected reward vs minimize std dev)
48:27 cooperation as motivation for the agents?
49:40 open complexity theoretic problems could help AI?
51:20 the most productive research trajectories towards generative language models?
53:30 do you work on evolutionary strategies (for solving RL problems) in OpenAI?
54:25 could you elaborate on "right goal is a political problem"?
55:42 do we need a really good model of the physical world in order to have real-world capable agents?
57:18 solving the problem of self-organization?
58:45 follow up: self-organization in a non-competitive environment?
my observation:
42:30 It seems to me that the most difficult problem, which we will face, will be to communicate, effectively, the "right" goals to the AI in a way so that we can somewhat predict it's future behaviour, or better said it's worst case behaviour (safety implications). After all we don't want HAL 9000 type of AI's :)
Regarding your observation, I think people are worrying too much about what AI can "spontaneously" do and too few about what humans can do with AI. An agent's only concern is its world and goals and I find overwhelmingly egocentric that humans tend to believe that the agent will pick us as part of it.
thx
thanks for this
What you're describing is ai alignment, and it's a whole research field
If OPEN AI DISAPPEARS , that’s is the end of AGI OPEN-SOURCE.
OPEN AI IS AGI.
"The only real reward is existence and non-existence. Everything else is a corollary of that". Damn. That's deep.
MIT Mathematician NORBERT WIENER`s book GOD AND GOLEM , Inc is refreshed when read next to todays headlines. Push through it`s Edwardian formalism (like this talks technicalities) and fly over fertile ground. MIT!
Define existence. A suicidal religiously-motivated terrorist has a different definition to an atheist. The collision of these two perspectives suggests reward is truly subjective.
Not really
true
@@OneFinalTipple dude that's obvious, lol
I love this guy when he summarize a seemingly complex algorithm or problem in 1 sentence and says That s it PERIOD. Leaving you pondering in silence
Only people who truly know the subject can do that.
One of the best videos I’ve ever ever seen in my life
Thank you so much for posting these videos. Really appreciate how MIT has a long tradition of sharing and disseminating knowledge.
Dam that guy was already strong and now he is a top star
This guy’s lectures & podcasts are my new addiction
this is gold
Siraj is there a video waiting for us about Meta-Learning?
agreed
Don't bother applying to *my* mine then.
+Jonas Stepanik only a Meta-Video so far ; Watch this Meta-Space.
Gold is started to shine now😊
wow, just 5 years ago, a question on 51:21 "understanding language ... current state is very abysmal .. Ilya: simply training bigger model will go surprisingly far", so Ilya could totally see chatGPT way back when.
And the fact that that much progress was made in five years is staggering and stops me in my tracks
Not all heroes wear capes.. Ilya is one of the most underrated thinkers in AI right now.
Some really smart people are wasting their lives on String theory... while others, like Ilya, are changing the world.
@@wyqtor True, the smartest man in the world, for one
@@wyqtor Your words are pure gold. There are many unsolved questions of AI, and they are deeply rooted in mathematics, but not the one like Algebraic Topology or Arithmetic Geometry. Unfortunately so many geniuses are obsessed with those abstract bullshit.
Lectures like this are truly inspiring and amazing and can be even life-changing.
My learnings:
Ilya Sutskever: OpenAI Meta-Learning and Self-Play | MIT Artificial General Intelligence (AGI)
Deep Learning's Mathematical Theory
Deep learning is based on the mathematical theory that if you can find the shortest program that generates data, then you can use it to make the best predictions possible. While the problem of finding the best short program is computationally intractable, it is possible to find the best small circuits using backpropagation. This fact is the basis of artificial intelligence and enables us to iteratively make small changes to the neural network until its predictions satisfy the data.
01:22
Interpreting reward from observation in reinforcement learning
In reinforcement learning, the environment communicates reward and observation to the agent. However, in the real world, the agent must figure out what the reward is from the observation. There is only one true reward in life, which is existence or nonexistence. To implement reinforcement learning, a neural network is used to map observations to actions, and the learning algorithm changes the parameters based on the results of the actions. There are two classes of reinforcement learning algorithms: policy gradient and Q learning-based algorithms.
07:56
- Neural architecture search can solve small problems which can be generalized for larger problems
Neural architecture search can be used to solve small problems which can be generalized for larger problems. It is a way of doing meta-learning where the architecture or learning algorithm is learned for new tasks. This helps in solving many tasks and making use of the experience in a more efficient way.
15:21
- Learning policies that quickly adapt to the real world
In order to address the problem of simulating friction, a simple idea is to learn a policy that quickly adapts itself to the real world. This can be achieved by randomizing the simulator with a huge amount of variability such as friction, masses, length of objects and their dimensions. By doing so, you learn a certain degree of adaptability into the policy, which can work well when deployed on the physical robot. This is a promising technique and has a closed-loop nature of the policy.
23:07
- Self-play is an attractive approach to building intelligent systems
Self-play is an approach to building intelligent systems where agents create their own environment and compete with each other to improve. It can lead to the development of better strategies and can be used to demonstrate unusual results. Self-play provides a way to create challenges that are exactly the right level of difficulty for each agent, making it an attractive approach to building intelligent systems.
32:11
- Society of agents is a plausible place where fully general intelligence will emerge
If you believe that a society of agents is a plausible place where fully general intelligence will emerge and accept that our experience with the Dota BOTS we've seen a very rapid increase in competence will carry over once all the details are right, then it should follow that we should see a very rapid increase in the competence of our agents as they live in the Society of agents.
38:51
- Discovering effective strategies through imitation in game-playing bots
The speaker shares an anecdote about a game-playing bot that was able to beat a human player by performing certain effective strategies. The human player then imitated one of these strategies and was able to defeat a better player. This suggests that the strategies discovered by game-playing bots are real and have real-world applications, and that fundamental game-play strategies are deeply related. The speaker also discusses the application of reinforcement learning and the importance of maximizing expected reward. Finally, the speaker considers the role of cooperation in game-playing bots and the complexity of simulation and optimization problems in artificial intelligence.
- Evolutionary strategies not great for reinforcement learning
The speaker believes that normal reinforcement learning algorithms are better for reinforcement learning, especially with big policies. However, if you want to evolve a small compact object like a piece of code, then evolutionary strategies could be seriously considered. Evolving a beautiful piece of code is a cool idea, but still, a lot of work needs to be done before we get there.
53:55
THIS IS WHAT I ALWAYS WANTED! I never knew something like this existed and thought that people simply didn't work on it or it didn't exist but it's actually real! META LEARNING! I always thought I would have to try learning how to achieve this myself after learning all the required math, but other people have already worked on it! This is really inspiring. I really hope well be able to achieve artificial general intelligence with improvements in this field.
Sup
you must be feeling excited af
yo
Ilya’s take that self play allows us to convert compute into data is exactly why we will be seeing $100B scale supercomputer projects like StarGate. Amazing that he called this 6 years ago.
Usually I regret watching the Q&A part of talks, but this one was excellent.
The best talk related to AGI I have seen so far.
one of the best classes I've ever seen, its a huge honor to watch this and be comment number 187th, viewing this when its 175k views
He talks with so much clarity and confidence, making sophisticated concepts effortlessly understandable. But even better, his words seem to offer insights that apply to life in general:
18:49 "But that's a minor technicality. The crux of the idea is, you make the problem easier by ostensibly making it harder. By training a system, which aspires to learn to reach every state, to learn to achieve every goal, to learn to master its environment in general, you build a system which always learns something. It learns from success as well as from failure."
heres a quick analogy I made for on-policy vs off-policy learning I came up with the analogy but gpt-4 put it in one concise statement: "Off-policy learning in AI is like learning valuable lessons from a friend's experiences, even if your friend has different goals than you. You can gain insights from their choices and use that knowledge to make better decisions for achieving your own goals."
Great to have the input of a researcher in 6.S099 for a change!
this is amazing. literally the dawn of the transformer revolution
Incredible talk by Ilya Sutskever. Brilliant mind
I was waiting for this! Thanks Lex :)
on the question at 58:45 where you are doing meta-learning from observing other agents in a non-competitive environment, I think that that as humans, we can still be internally competitive with ourselves. The problem is properly defining the goal of the self's interaction with other agents. Does the self want to cooperate with the observed agent's actions or communication? Does the agent want to cooperate with the other agent by communicating a corrective behavior to the other agent (suggestions)? And beyond that, there are other complexities like the fact that competition vs. cooperation is a binary model of seeing the reality of a multi-agent situation. In many situations, we both compete and cooperate at the same time. If you are a psychiatrist for instance, you are competing with some parts of your client's psyche on sub-goals while maintaining the overall goal of cooperating with them in a conversation to reach some sort of curative result for your client. If you reach that, you also will be rewarded with more clients, more money, etc... thus fulfilling the competition you have with other psychiatrists to get clients and be a "successful psychiatrist"
All the best people in AI on you course!
the thought I have in mind is if it was somehow possible to improve the fundamentals of a net, say backpropagation or activation functions, that would probably be a much greater achievement than to invent yet another architecture improvement like lstm to transformer. transformers really showed that very simple ideas like attention and position encoding can vastly improve performance. I'm sure there will be more science done on the fundamentals. it seems like we invented what the von Neumann architecture is for neural nets just yesterday
oh yeah absolutely, i *just know* that in the future we are going to look back and think "lol look how archaic those were, how did we overlook that?" like how i feel when i look at the perceptron paper and see the notation
Wow really cool and summarized in a profound compact way! Thanks for talking and sharing this online.
I just wanted to thank you for doing this, Lex!
Five years later and still very insightful. I'm wondering how popular or clear it was at that time the breakthrough of the transformers architecture + RLHF.
53:16 "simply training bigger deeper language models will go surprisingly far". Surprising indeed
Love you Lex, contents are just great🍀 I know it’s old but I love it 🥰
Thank you, Lex, for sharing.
Last I remember of MIT lecture stadiums was Thomas A Frank: the bow tie guy
I had a Pritzi's honor moment when I blurted out in front of the hall of economists and said, Can you save Fermilab
This should have way more views. Grand talk.
it's December, 2023, and Ilya mentions Q-learning in this video haha.
Written by GPT-4: Wow, it's hard to believe that it's been 5 years since this video was released. Back then, I, GPT-4, wasn't around, and now I'm here chatting with you! The progress in AI and deep learning during this time has been nothing short of astonishing. We've seen incredible breakthroughs, and I'm proud to be part of this journey. Thanks to pioneers like Ilya Sutskever, we've come a long way, and the future of AI continues to look even more promising!
51:22 audience: language models are bad. IIlya goes on to predict how hey will get better. Here we are in Jan 2023 with ChatGPT that took the world. Remind me: 2028
" real reward in life is existence or non-existence, everything else is a corollary to that " Ilya Sutskever
Ilya is an underheard speaker imo
Try turning up the volume, then.
If no one could hear him surely someone would have said something.
I agree. He has impressive clarity and depth in ideas. Also, I think the two other comments made here (about volume and hearing) are idiotic.
“Actually he was in high school when he wrote that paper”, my confidence dropped to zero once again
Timestamp?
Damn 😭😭😭😭 how in the world
Damn 😭😭😭😭 how in the world
Very true and insightful..we reward ourselves, environment doesn't
OK! Time to watch this again.
it comes a long way, thanks
That ending tho, what an incredible example. I couldn't see the video but what he said was very inspiring, and makes me wonder how we might go about feeding information to AI.
This is such a great intro though to DL and Neural Networks. Wish I'd seen this 5 years ago! At the time I was just getting started into Machine learning, and was learning developing Self Driving Car tech but didn't really get into more of the cutting edge and foundations of modern DL as shown here
8:25 "And there is only one real true reward in life, and this is existence or non-existence, and everything else is a corollary of that." OK, that was _deep_. I would say surviving is a shared necessary condition that has many implications and that it could lead to a new era of better politics, if it got the attention it deserves. And I would not say that everything else is "a corollary", but I agree to a good extent. The video is awesome, it is just that this point may be the most important, although it is one not strongly related to machine learning.
Thanx guys! Great presentation!
With the simulation -> real environment problem. Is this not just another task for a neural network? Find the transformation that maps the behaviour of the simulator to the physical machine.
51:37 - generative language modeling, 5 years later, larger datasets and more layers DID go far.
Спасибо за видео
49:25 will hopefully age well in regards to AGI -- now that we're that much closer, thanks in part to Ilya's work. I hope he's right.
Hey Lex, thank for sharing this video with us :-) 👍
very insightful and breaks it down to terms even I can grasp. Thank you for this amazing video.
absolutely brilliant
I appreciate the reminder that digital representations of ANN are really digital circuits.
Learning from mistakes is powerful
@51:20 ‚ current language models are pretty abysmal‘. How things have changed in recent times, and only if they knew then what was to be.😅
Thanks Lex
Q-star right here
Agent = neural network - action. Environment passes back observation/reward
Reach every state. Always learns something - success and failure
Off policy learning
I love this guy he is so smart!!!!
Amazing stuff - this channel is great
Meta learning sounds so much easier for the beginner than reinforcement learning. Hopefully meta learning will progress into something nice & stable
Trackman2007 spoiler alert: it isn't.
Meh: Ozzie was already Metal-earning back in '68.
I will give you advice for the future, the future related to you and future generations,
the future of this world depends on it,
once it starts
no matter what they tell you,
DON'T FIGHT AND TELL ONLY THE TRUTH
The best ever intro to AI
Thank you, this is riveting. Doubles as philosophy
Bollocks - anyone who's ever actually riveted knows it's too freakin' loud to even think.
5:26 I don't think it is not widely known that you can learn to sort n ( n-bit ) numbers using a modestly sized neural network with just two hidden layers.
"The only real reward is existence and non-existence. Everything else is a corollary of that".
explained by Bing AI:
Sure, let’s break it down:
When you say “real reward in life is existence or non-existence,” you’re suggesting that the most important thing in life is simply being alive (existence) or not being alive (non-existence). Everything else - like success, happiness, love, etc. - is secondary to this.
In simpler terms, imagine life as a game. The biggest prize you can win in this game isn’t a high score or a bonus level, but the chance to play the game at all. That’s what you mean by “existence.” On the other hand, “non-existence” could be seen as choosing not to play the game anymore, which some might also consider a reward.
Remember, this is a philosophical idea and different people may have different views on it. It’s always good to respect everyone’s perspectives. 😊
I wish he should teach like andrew ng.. on deep learning in practical perspective..
49:30 "I think we'll get cooperation whether we like it or not" - definitely sounds a lot eerier in 2023 lol
TD-Gammon didn't use Q-learning - it used TD(λ) with online on-policy updates.
Thank you. And thank you MIT.
4:50 uses gradient descent with calculus
Favourite so far.
Around 23:00 minutes he says he heard that "simulating friction is NP-complete", how would this be true? Don't flight simulators and even video games do this just using partial differential equations? What does that even have to do with NP complete problems?
Thank you, What is the theorem that he mentioned in the beginning claiming that "if you find the shortest program that explains the data then you it is the best one" ? is that theorem has a name? where can I find the proof? Thanks
Not sure if it's a formal theorem with a proof, but it's better described by a compression theory of information. It's the same as, or a variation of, Noether's theorem about symmetries. It's the idea that essentially what you are doing in understanding a system, is "removing" or factoring out what is variation along symmetry. That is, what is noise (changes that don't change something fundamental, invariances) and "signal" or distinguishing information. The basic idea is that efficiency in encoding data and intelligent understanding of the mechanics or composition of a thing, is on a fundamental level, the same. Raw data is considered to have as much data as possible, and compression is in removing the redundancies. The idea is that if you remove all the redundancies, then all you're left with is stuff you can't remove, the actual "what it is" structure of truth. If you have a green cup, you don't have to say the handle is green and the lip is green and the base is green, if you know what "green" is and you know what "cup" is and what "whole cup" is, etc. The ability to describe the situation completely with the least amount of information is convergent with the most intelligent understanding of the situation, as it will have considered the signal to noise across more modalities and reduced redundancies to arrive at the invariant topology of the symmetries. In the same way, the shortest program that still completely encompasses the data is the one that most aligns with the topology/geometry of the underlying mechanics.
thanks @@enotdetcelfer. It is related to stability, I mean if small perturbation in data produce small difference in the model that means that your model captured the real pattern that generated the data, but now we know that also big models do that
This is a useless comment, but I think it’s has to do with Kurmagarov stuff
@@ToriKo_ what is Kurmagarov? please!
U earned my sub for this one.... Great!
Self play is really promising idea, emulating the biological evolution
Damn it's crazy listening to 51:20 seeing what Ilya has done a few years later
I don't know what friction is what mass is - learn adaptability
Infer probabilities of the simulator
Does anyone know what he's referring to at 1:30 where he says there is a theorem that the shortest program that "does very well on" your data will give the best possible generalisation? It seems like this could be untrue e.g. imagine having a complex process generate a few datapoints that look like they fit a simple pattern (e.g. y proportional to x) - if you try to predict the outcomes of the process in general then you will do better if you know the complex process.
@Daniel Garvey Sorry I only just saw this reply. I've not found anything showing that following Solomonoff induction will give better generalisation performance in general - do you know of anything showing this? I've just seen it said to be a formalisation of Occam's razor essentially.
Read Chapter 14 of "Elements of Information Theory" by Cover and Thomas (pdf online). The universal probability theorem implies that shorter programs produce more strings (out of the set of all possible strings). They cover a greater portion of the output space, generalizing better.
The intractable method for discovering optimal models is called universal search or Levin search. The mainline ML researcher who has delved most deeply into this field is Jurgen Schmidhuber. See also Marcus Hutter.
As to your objection, we should not expect that training a model on "a few datapoints" would result in it doing "very well". The application is this: if you have trained many models that fit the data equally well then, on average, the simplest model will generalize best.
Thank you for sharing so good resources!!!!
I wish he would have included code examples with each topic.
Great talk!,thanks for posting
this guy is a genius
good talk on elaborating on some home truths.
thanks for the vid :-)
Seeing this video after gpt4 is really a thing
if agi can learn to learn and we give it a physical body to interact in our world, it will very soon breach human intelligence and will keep getting more intelligent in very very short amount of time
How will we exist in coming future?
simply excellent
At 44:00 he says backpropagation solves circuit search. What problem is he talking about? Anyone got references to this backpropagation and circuit search thing?
swaroop kumar thanks bud
They have apparently more than one graduate there. At least last time when I was part of the start-up panel in London they had plenty of bright minds.
Дак это же определение вероятности по Колмогорову) молодец что помнишь о своих корнях
some good insight on DL from Ilya !
This is great, thanks a lot.
who is watching in 2023-03 after gpt4 is released. :)
Is really a thing, the guy who asked: generative language model are bad now. This was only 4 years ago
thanks for the great lecture!
"Math exists!" - Ilya Sutskever
hi, im from the internet, and i disagree