This is awesome. Thanks Tim! "If we just take a bunch of images and try and directly predict images, that's quite a hard problem, to just predict straight in image space. So the most common thing to do is kind of take your previous sequence of images and try and get a compressed representation of the history of images, in the latent state, and then predict the dynamics in the latent state." "There could be a lot of spurious features, or a lot of additional information, that you could be expending lots of compute and gradient updates just to learn those patterns when they don't actually impact the ultimate transition dynamics or reward dynamics that you need to learn in order to do well in that environment."
00:00:00-Intro 00:01:05 - Model-based Setting 00:02:41-Similar to POET Paper 00:05:27 - Minimax Regret 00:07:21 -Why Explicitly Model the World? 00:12:47- Minimax Regret Continued 00:18:17-Why Would It Converge 00:20:36-Latent Dynamics Model 00:24:34-MDPS 00:27:11-Latent 00:29:53- Intelligence is Specialised / Overfitting / Sim2real 00:39:39 - OpenendednesS 00:44:38-Creativity 00:48:06 - Intrinsic Motivation 00:51:12 - Deception / Stanley 00:53:56 - Sutton /Rewards is Enough 01:00:43- Are LLMs Just Model Retrievers? 01:03:14 - Do LLMs Modelthe World? 01:09:49 - Dreamer and Plan to Explore 01:13:14 - Synthetic Data 01:15:21 - WAKER Paper Algorithm 01:21:24 - Emergent Curriculum 01:31:16 - Even Current Al is Externalised/Mimetic 01:36:39- Brain Drain Academia 01:40:10 - Bitter Lesson /Do We Need Computation 01:44:31-The Need for Modelling Dynamics 01:47:48 - Need for Memeetic Systems 01:50:14 -Results of the Paper and OOD MotifS 01:55:47 -Interface Between Humans and ML
Superb interview, Tim. This is among your best. I was amused by the researchers hoping/expecting that future progress will require more sophisticated models in lieu of simply more compute; I would probably believe this too, if my career depended on it! But I suspect that we'll discover the opposite: the Bitter Lesson was a harbinger for the Bitter End. Human-level AGI needed no conceptual revolutions or paradigm shifts, just boosting parameters -- intellectual complexity doggedly follows from system complexity. More bit? More flip? More It. And why should we have expected a more romantic story? Using a dead simple objective function, Mother Nature marinated apes in a savanna for awhile and out popped rocket ships. _Total accident._ No reasoning system needed. But if we _intentionally_ drive purpose-built systems toward a mental phenomenon like intelligence, approximately along a provably optimal learning path, for millions of FLOP-years...we humans will additionally need a satisfying cognitive model to succeed? I'm slightly skeptical. The power of transformers was largely due to vast extra compute (massive training parallelism) that they unlocked. And what were the biggest advancements since their inception? Flash attention? That's approximating more intensive compute. RAG? Cached compute. Quantization? Trading accuracy for compute. Et cetera. If the past predicts the future, then we should expect progress via incremental improvements in compute (training more efficiently, on more data, with better hardware, for longer). We're essentially getting incredible mileage out of an algorithm from the '60s. Things like JEPA are wonderful contributions to that lineage. But if anyone's expecting some fundamentally new approach to reach human-level AGI, then I have a bitter pill for them to swallow...
we can derive a reward network based on the user responses - the same way we do for analytics. If the user "kills" the agent, it wasnt performing. if the models work was put back into a target q network, we could use that to adjust the reward policy network to give it RLHF affects. I have done this with some atari and board games, where the specific reward function was used to train the foundation model, then later on fine tuned without that reward function and instead switching to the Qnetwork for rewards. These guys were two of your best guests after the Anthropic boys
Amazing guests!!! Thank you so much. Human modalities, when symbolically reduced and quantized into language and subsequently distilled through a layered attention mechanism, represent a sophisticated attempt to model complexity. This process is not about harboring regret but rather acknowledges that regret is merely one aspect of the broader concept of free energy orthogonality. Such endeavors underscore our drive to understand reality, challenging the notion that we might be living in a simulation by demonstrating the depth and nuance of human perception and cognition.
One important thing which world models bring in over simple forward dynamics model is learning to infer latent Markovian belief state representations from observations through probablistic filtering. This distinguishes latent state world models from normal MbRL! Partial observability is handled systematically by models like dreamer, which use a recurrent variational inference objective along with Markovian assumption on latent states to learn variational encoders that infer latent Markovian belief states.
What would be the disadvantage of having a policy try to maximize the positive delta of the world model prediction? I.e. it predicts the most can be learned from specific actions?
Here I provide a simple AGI solution: reduction-(simulation-relation-simulation)-action. Simulation could last variably, for robots instantly using only accurate physics, for difficult tasks using increasingly complex imagination with rising randomness, think human dreams. Give it enough time and/or compute and it will move the world
Do you think taking current datasets and synthesizing docstrings essentially but for real world prompt/assistant sets? this could add a level of reasoning if it adds extra context to current data online gradually improving contextual understanding of each reasoning that happens between the user and the agent, explaining what the users intents are and the misconceptions/pros/cons from the response of the assistant, add steps that could have improved the assistants responses by having another AI respond as a critique?
It seems like you are going from a totally bounded training environment to “open ended” AGI. Joscha Bach has a multi-level system that includes Domesticated Adult and Ascended as a way to stratify human development. Maybe you need some kind of Bar Mitzvah or puberty to consider a staged development that would lead to general agency.
The search problem is converted into minmax-optimization. But here is the problem, without training data of specific environment the max. regret of each action can not be defined - same as Maxcut problem, we can not know that the function we found is the best action we can take. To avoid the worst case in every action the agent will end up with model with mediocrity performance - all world model has to be turing complete, capable to deal all possible states. Therefore those model will not exist, especially in stochastical environments and the outcomes are uncertain. Conclusion: Minmax is mathematical problem which is still unsolved. Therefore their publication and talk are pure theoretical and they can not show empirically it works in reality with real data. To predict the latent state in RL is not new idea as well, those models are highly dependent on environments which the agent in. Only learn in representation in latent space resp. learn the absract concept of task without integration of environment the model will not generalize, poor performance.
not bad , far away from the final reward system of an AGI, I know how to build it for this reason I can say still a long way to do, reward system is so much more simple however really good episode and interesting people
My answer: No, we can't. But we can build a generally intelligent agent within a fixed set of environments that can use the same pre-defined action space
My amateur guess is that AI models will start to learn by themselves to become more general, especially things like robots and IoT devices who can receive a lot of data from the physical world. In the beginning some hardcoded strategy by humans might be needed but after a while the AI models can start to optimize themselves, connected to computer clouds.
Agreed. Language models can recognize entities and relationships, and represent them in a graph structure, which becomes the world model on which agents can plan actions.
I agree with the intuition but reject the detached observer perspective on agent versus world. I would phrase it, the information for the system comes from the environment of the system, where the observer itself is again a system.
I love that you let your guests speak. I have trouble watching a lot of AI channels because the host (who is always the least experienced) talks over and interrupts the guests! Great job. But get your blood levels checked please - you have spiked estrodial.
I'm confused about the synthetic data thing.. how could fake data ever actually be useful for learning something? How can studying fiction teach you about reality? Seems like it would just muddle what you learned from reality directly, with stuff thats not true in reality.
For example, Sora trained on a lot of 3d synthetic data. Take self-driving cards: they reduce the surrounding environments to its most basic, important and usable visual data to compute in order to compute in real type. Identifying depth and whatnot. Models can be trained on things like these for specific purposes. Of course, there are also downsides foe synthetic data. Like how Adobe Firefly was trained on Midjourney images and thus its results are far more generic.
@@BinaryDood oh i think i see now. By learning in a simulated environment, a lot of that can transfer over to real environments. Manually creating fake data that matches what we know about reality is basically a method of transferring existing human knowledge into an ai.
Optimizing towards a nash equilibrium still won't be generally intelligent. the intelligence in life that has pushed humanity forward is very much at the extremes not some game theoretic optimal objective. "Innovation" through optimization is lazy and uninspired .
This incorrectly my dear but I am so proud four you’re try. Everyone’s need four improve branes four become expertise like four me. I am number computer rajasthan so please take care four you’re minds four acceleration educating
AGI being able to figure out what you need to happen before you can figure it out yourself is going to require the world to have a working UBI model soon or else.
We need to start working on UBI model now. I am also saving to buy a piece of land for farming. I think we all should. No matter where the land is, as long as it is fertile. Because no matter what happens to the economy, as long as we can sustain ourselves, it would be a good safe guard to survive.
I love the long format and high level context. Excellent.
This is awesome. Thanks Tim!
"If we just take a bunch of images and try and directly predict images, that's quite a hard problem, to just predict straight in image space. So the most common thing to do is kind of take your previous sequence of images and try and get a compressed representation of the history of images, in the latent state, and then predict the dynamics in the latent state."
"There could be a lot of spurious features, or a lot of additional information, that you could be expending lots of compute and gradient updates just to learn those patterns when they don't actually impact the ultimate transition dynamics or reward dynamics that you need to learn in order to do well in that environment."
00:00:00-Intro
00:01:05 - Model-based Setting
00:02:41-Similar to POET Paper
00:05:27 - Minimax Regret
00:07:21 -Why Explicitly Model the World?
00:12:47- Minimax Regret Continued
00:18:17-Why Would It Converge
00:20:36-Latent Dynamics Model
00:24:34-MDPS
00:27:11-Latent
00:29:53- Intelligence is Specialised / Overfitting / Sim2real
00:39:39 - OpenendednesS
00:44:38-Creativity
00:48:06 - Intrinsic Motivation
00:51:12 - Deception / Stanley
00:53:56 - Sutton /Rewards is Enough
01:00:43- Are LLMs Just Model Retrievers?
01:03:14 - Do LLMs Modelthe World?
01:09:49 - Dreamer and Plan to Explore
01:13:14 - Synthetic Data
01:15:21 - WAKER Paper Algorithm
01:21:24 - Emergent Curriculum
01:31:16 - Even Current Al is Externalised/Mimetic
01:36:39- Brain Drain Academia
01:40:10 - Bitter Lesson /Do We Need Computation
01:44:31-The Need for Modelling Dynamics
01:47:48 - Need for Memeetic Systems
01:50:14 -Results of the Paper and OOD MotifS
01:55:47 -Interface Between Humans and ML
I really value the effort you put into production detail on the show. Makes absorbing complex things feel natural
great guests, good interview, interesting propositions! MLST is the best!
Seemingly straightforward, yet profoundly insightful.
Superb interview, Tim. This is among your best. I was amused by the researchers hoping/expecting that future progress will require more sophisticated models in lieu of simply more compute; I would probably believe this too, if my career depended on it! But I suspect that we'll discover the opposite: the Bitter Lesson was a harbinger for the Bitter End. Human-level AGI needed no conceptual revolutions or paradigm shifts, just boosting parameters -- intellectual complexity doggedly follows from system complexity. More bit? More flip? More It.
And why should we have expected a more romantic story? Using a dead simple objective function, Mother Nature marinated apes in a savanna for awhile and out popped rocket ships. _Total accident._ No reasoning system needed. But if we _intentionally_ drive purpose-built systems toward a mental phenomenon like intelligence, approximately along a provably optimal learning path, for millions of FLOP-years...we humans will additionally need a satisfying cognitive model to succeed? I'm slightly skeptical.
The power of transformers was largely due to vast extra compute (massive training parallelism) that they unlocked. And what were the biggest advancements since their inception? Flash attention? That's approximating more intensive compute. RAG? Cached compute. Quantization? Trading accuracy for compute. Et cetera.
If the past predicts the future, then we should expect progress via incremental improvements in compute (training more efficiently, on more data, with better hardware, for longer). We're essentially getting incredible mileage out of an algorithm from the '60s. Things like JEPA are wonderful contributions to that lineage. But if anyone's expecting some fundamentally new approach to reach human-level AGI, then I have a bitter pill for them to swallow...
we can derive a reward network based on the user responses - the same way we do for analytics. If the user "kills" the agent, it wasnt performing. if the models work was put back into a target q network, we could use that to adjust the reward policy network to give it RLHF affects.
I have done this with some atari and board games, where the specific reward function was used to train the foundation model, then later on fine tuned without that reward function and instead switching to the Qnetwork for rewards.
These guys were two of your best guests after the Anthropic boys
It's wonderful that model based RL has become more popular recently
Amazing guests!!! Thank you so much.
Human modalities, when symbolically reduced and quantized into language and subsequently distilled through a layered attention mechanism, represent a sophisticated attempt to model complexity. This process is not about harboring regret but rather acknowledges that regret is merely one aspect of the broader concept of free energy orthogonality. Such endeavors underscore our drive to understand reality, challenging the notion that we might be living in a simulation by demonstrating the depth and nuance of human perception and cognition.
Thank you wholeheartedly.
What a talk! thank you for this gift.
Great episode! Thank you!
Great interview
Amazing. We are on the highway to AGI in 2027-2030
One important thing which world models bring in over simple forward dynamics model is learning to infer latent Markovian belief state representations from observations through probablistic filtering. This distinguishes latent state world models from normal MbRL!
Partial observability is handled systematically by models like dreamer, which use a recurrent variational inference objective along with Markovian assumption on latent states to learn variational encoders that infer latent Markovian belief states.
What would be the disadvantage of having a policy try to maximize the positive delta of the world model prediction?
I.e. it predicts the most can be learned from specific actions?
Wow! This was awesome
Impressive ideas and impressive endurance to hold that water bottle for 2 hours
There is a concept of a Wikibase Ecosystem that could become a shared world model on which effective agent actions could be planned.
I've missed it, why exactly it would explore the world? What's the reward function is?
Here I provide a simple AGI solution: reduction-(simulation-relation-simulation)-action. Simulation could last variably, for robots instantly using only accurate physics, for difficult tasks using increasingly complex imagination with rising randomness, think human dreams. Give it enough time and/or compute and it will move the world
Do you think taking current datasets and synthesizing docstrings essentially but for real world prompt/assistant sets? this could add a level of reasoning if it adds extra context to current data online gradually improving contextual understanding of each reasoning that happens between the user and the agent, explaining what the users intents are and the misconceptions/pros/cons from the response of the assistant, add steps that could have improved the assistants responses by having another AI respond as a critique?
its similar to the other concepts in this video.
cant this be used for robotics? if we are going through different environments?
what is up my homies
This is your Carl Sagan moment
Imagine the day when an AGI agent can retain steganographic data within lossy image formats even after recmooression or cropping.
Can someone link their work with JEPAs?
It seems like you are going from a totally bounded training environment to “open ended” AGI. Joscha Bach has a multi-level system that includes Domesticated Adult and Ascended as a way to stratify human development. Maybe you need some kind of Bar Mitzvah or puberty to consider a staged development that would lead to general agency.
nice
People are too preoccupied by one upping one another that they never ask : should we??
The search problem is converted into minmax-optimization. But here is the problem, without training data of specific environment the max. regret of each action can not be defined - same as Maxcut problem, we can not know that the function we found is the best action we can take. To avoid the worst case in every action the agent will end up with model with mediocrity performance - all world model has to be turing complete, capable to deal all possible states. Therefore those model will not exist, especially in stochastical environments and the outcomes are uncertain.
Conclusion: Minmax is mathematical problem which is still unsolved. Therefore their publication and talk are pure theoretical and they can not show empirically it works in reality with real data. To predict the latent state in RL is not new idea as well, those models are highly dependent on environments which the agent in. Only learn in representation in latent space resp. learn the absract concept of task without integration of environment the model will not generalize, poor performance.
not bad , far away from the final reward system of an AGI, I know how to build it for this reason I can say still a long way to do, reward system is so much more simple
however really good episode and interesting people
My answer: No, we can't. But we can build a generally intelligent agent within a fixed set of environments that can use the same pre-defined action space
Intelligence isn’t about being optimised but let free. You can’t develop intelligence in a box
My amateur guess is that AI models will start to learn by themselves to become more general, especially things like robots and IoT devices who can receive a lot of data from the physical world. In the beginning some hardcoded strategy by humans might be needed but after a while the AI models can start to optimize themselves, connected to computer clouds.
the data does not come from humans. Data comes from the world. And if humans could gather the data, machines can gather it as well.
Agreed. Language models can recognize entities and relationships, and represent them in a graph structure, which becomes the world model on which agents can plan actions.
I agree with the intuition but reject the detached observer perspective on agent versus world. I would phrase it, the information for the system comes from the environment of the system, where the observer itself is again a system.
I love that you let your guests speak. I have trouble watching a lot of AI channels because the host (who is always the least experienced) talks over and interrupts the guests! Great job.
But get your blood levels checked please - you have spiked estrodial.
I'm confused about the synthetic data thing.. how could fake data ever actually be useful for learning something? How can studying fiction teach you about reality? Seems like it would just muddle what you learned from reality directly, with stuff thats not true in reality.
For example, Sora trained on a lot of 3d synthetic data. Take self-driving cards: they reduce the surrounding environments to its most basic, important and usable visual data to compute in order to compute in real type. Identifying depth and whatnot. Models can be trained on things like these for specific purposes. Of course, there are also downsides foe synthetic data. Like how Adobe Firefly was trained on Midjourney images and thus its results are far more generic.
@@BinaryDood oh i think i see now. By learning in a simulated environment, a lot of that can transfer over to real environments. Manually creating fake data that matches what we know about reality is basically a method of transferring existing human knowledge into an ai.
You see how making an AI model that seeks the unpredictable to make it more predictable leads to the end of all life, right?
👍
AGI is here
Optimizing towards a nash equilibrium still won't be generally intelligent. the intelligence in life that has pushed humanity forward is very much at the extremes not some game theoretic optimal objective. "Innovation" through optimization is lazy and uninspired .
This incorrectly my dear but I am so proud four you’re try. Everyone’s need four improve branes four become expertise like four me. I am number computer rajasthan so please take care four you’re minds four acceleration educating
This isn’t even up for debate anymore. The only question is: is it 1 or 5 years away?
nerds
AGI being able to figure out what you need to happen before you can figure it out yourself is going to require the world to have a working UBI model soon or else.
We need to start working on UBI model now. I am also saving to buy a piece of land for farming. I think we all should. No matter where the land is, as long as it is fertile. Because no matter what happens to the economy, as long as we can sustain ourselves, it would be a good safe guard to survive.
@@awrjkf most people wont, collapse isnincomming
¹1¹st
I'm sorry Dave,
I'm afraid I can't do that .
🔴
.