Building a GENERAL AI agent with reinforcement learning

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ต.ค. 2024

ความคิดเห็น • 56

  • @Ben_D.
    @Ben_D. 6 หลายเดือนก่อน +13

    I love the long format and high level context. Excellent.

  • @CharlesVanNoland
    @CharlesVanNoland 6 หลายเดือนก่อน +9

    This is awesome. Thanks Tim!
    "If we just take a bunch of images and try and directly predict images, that's quite a hard problem, to just predict straight in image space. So the most common thing to do is kind of take your previous sequence of images and try and get a compressed representation of the history of images, in the latent state, and then predict the dynamics in the latent state."
    "There could be a lot of spurious features, or a lot of additional information, that you could be expending lots of compute and gradient updates just to learn those patterns when they don't actually impact the ultimate transition dynamics or reward dynamics that you need to learn in order to do well in that environment."

  • @redacted5035
    @redacted5035 6 หลายเดือนก่อน +10

    00:00:00-Intro
    00:01:05 - Model-based Setting
    00:02:41-Similar to POET Paper
    00:05:27 - Minimax Regret
    00:07:21 -Why Explicitly Model the World?
    00:12:47- Minimax Regret Continued
    00:18:17-Why Would It Converge
    00:20:36-Latent Dynamics Model
    00:24:34-MDPS
    00:27:11-Latent
    00:29:53- Intelligence is Specialised / Overfitting / Sim2real
    00:39:39 - OpenendednesS
    00:44:38-Creativity
    00:48:06 - Intrinsic Motivation
    00:51:12 - Deception / Stanley
    00:53:56 - Sutton /Rewards is Enough
    01:00:43- Are LLMs Just Model Retrievers?
    01:03:14 - Do LLMs Modelthe World?
    01:09:49 - Dreamer and Plan to Explore
    01:13:14 - Synthetic Data
    01:15:21 - WAKER Paper Algorithm
    01:21:24 - Emergent Curriculum
    01:31:16 - Even Current Al is Externalised/Mimetic
    01:36:39- Brain Drain Academia
    01:40:10 - Bitter Lesson /Do We Need Computation
    01:44:31-The Need for Modelling Dynamics
    01:47:48 - Need for Memeetic Systems
    01:50:14 -Results of the Paper and OOD MotifS
    01:55:47 -Interface Between Humans and ML

  • @MartinLaskowski
    @MartinLaskowski 6 หลายเดือนก่อน +7

    I really value the effort you put into production detail on the show. Makes absorbing complex things feel natural

  • @ehfik
    @ehfik 6 หลายเดือนก่อน +3

    great guests, good interview, interesting propositions! MLST is the best!

  • @NextGenart99
    @NextGenart99 6 หลายเดือนก่อน +3

    Seemingly straightforward, yet profoundly insightful.

  • @Dan-hw9iu
    @Dan-hw9iu 6 หลายเดือนก่อน +13

    Superb interview, Tim. This is among your best. I was amused by the researchers hoping/expecting that future progress will require more sophisticated models in lieu of simply more compute; I would probably believe this too, if my career depended on it! But I suspect that we'll discover the opposite: the Bitter Lesson was a harbinger for the Bitter End. Human-level AGI needed no conceptual revolutions or paradigm shifts, just boosting parameters -- intellectual complexity doggedly follows from system complexity. More bit? More flip? More It.
    And why should we have expected a more romantic story? Using a dead simple objective function, Mother Nature marinated apes in a savanna for awhile and out popped rocket ships. _Total accident._ No reasoning system needed. But if we _intentionally_ drive purpose-built systems toward a mental phenomenon like intelligence, approximately along a provably optimal learning path, for millions of FLOP-years...we humans will additionally need a satisfying cognitive model to succeed? I'm slightly skeptical.
    The power of transformers was largely due to vast extra compute (massive training parallelism) that they unlocked. And what were the biggest advancements since their inception? Flash attention? That's approximating more intensive compute. RAG? Cached compute. Quantization? Trading accuracy for compute. Et cetera.
    If the past predicts the future, then we should expect progress via incremental improvements in compute (training more efficiently, on more data, with better hardware, for longer). We're essentially getting incredible mileage out of an algorithm from the '60s. Things like JEPA are wonderful contributions to that lineage. But if anyone's expecting some fundamentally new approach to reach human-level AGI, then I have a bitter pill for them to swallow...

  • @agenticmark
    @agenticmark 4 หลายเดือนก่อน

    we can derive a reward network based on the user responses - the same way we do for analytics. If the user "kills" the agent, it wasnt performing. if the models work was put back into a target q network, we could use that to adjust the reward policy network to give it RLHF affects.
    I have done this with some atari and board games, where the specific reward function was used to train the foundation model, then later on fine tuned without that reward function and instead switching to the Qnetwork for rewards.
    These guys were two of your best guests after the Anthropic boys

  • @conorosirideain5512
    @conorosirideain5512 6 หลายเดือนก่อน +6

    It's wonderful that model based RL has become more popular recently

  • @diga4696
    @diga4696 6 หลายเดือนก่อน

    Amazing guests!!! Thank you so much.
    Human modalities, when symbolically reduced and quantized into language and subsequently distilled through a layered attention mechanism, represent a sophisticated attempt to model complexity. This process is not about harboring regret but rather acknowledges that regret is merely one aspect of the broader concept of free energy orthogonality. Such endeavors underscore our drive to understand reality, challenging the notion that we might be living in a simulation by demonstrating the depth and nuance of human perception and cognition.

  • @Seekerofknowledges
    @Seekerofknowledges 6 หลายเดือนก่อน +2

    Thank you wholeheartedly.

  • @filipefigueira6889
    @filipefigueira6889 3 หลายเดือนก่อน

    What a talk! thank you for this gift.

  • @flyLeonardofly
    @flyLeonardofly 6 หลายเดือนก่อน +1

    Great episode! Thank you!

  • @BilichaGhebremuse
    @BilichaGhebremuse 6 หลายเดือนก่อน +1

    Great interview

  • @olegt3978
    @olegt3978 6 หลายเดือนก่อน +2

    Amazing. We are on the highway to AGI in 2027-2030

  • @sai4007
    @sai4007 6 หลายเดือนก่อน +1

    One important thing which world models bring in over simple forward dynamics model is learning to infer latent Markovian belief state representations from observations through probablistic filtering. This distinguishes latent state world models from normal MbRL!
    Partial observability is handled systematically by models like dreamer, which use a recurrent variational inference objective along with Markovian assumption on latent states to learn variational encoders that infer latent Markovian belief states.

  • @codybattery8370
    @codybattery8370 3 หลายเดือนก่อน +1

    What would be the disadvantage of having a policy try to maximize the positive delta of the world model prediction?
    I.e. it predicts the most can be learned from specific actions?

  • @lancemarchetti8673
    @lancemarchetti8673 6 หลายเดือนก่อน

    Wow! This was awesome

  • @maddonotcare
    @maddonotcare 6 หลายเดือนก่อน +3

    Impressive ideas and impressive endurance to hold that water bottle for 2 hours

  • @johnkintree763
    @johnkintree763 6 หลายเดือนก่อน

    There is a concept of a Wikibase Ecosystem that could become a shared world model on which effective agent actions could be planned.

  • @XOPOIIIO
    @XOPOIIIO 6 หลายเดือนก่อน +1

    I've missed it, why exactly it would explore the world? What's the reward function is?

  • @uber_l
    @uber_l 6 หลายเดือนก่อน +3

    Here I provide a simple AGI solution: reduction-(simulation-relation-simulation)-action. Simulation could last variably, for robots instantly using only accurate physics, for difficult tasks using increasingly complex imagination with rising randomness, think human dreams. Give it enough time and/or compute and it will move the world

  • @drlordbasil
    @drlordbasil 3 หลายเดือนก่อน

    Do you think taking current datasets and synthesizing docstrings essentially but for real world prompt/assistant sets? this could add a level of reasoning if it adds extra context to current data online gradually improving contextual understanding of each reasoning that happens between the user and the agent, explaining what the users intents are and the misconceptions/pros/cons from the response of the assistant, add steps that could have improved the assistants responses by having another AI respond as a critique?

    • @drlordbasil
      @drlordbasil 3 หลายเดือนก่อน

      its similar to the other concepts in this video.

  • @amirul2566
    @amirul2566 18 วันที่ผ่านมา

    cant this be used for robotics? if we are going through different environments?

  • @FlySociety4
    @FlySociety4 6 หลายเดือนก่อน +1

    what is up my homies

  • @RokStembergar
    @RokStembergar 6 หลายเดือนก่อน

    This is your Carl Sagan moment

  • @lancemarchetti8673
    @lancemarchetti8673 6 หลายเดือนก่อน

    Imagine the day when an AGI agent can retain steganographic data within lossy image formats even after recmooression or cropping.

  • @willbrenton8482
    @willbrenton8482 6 หลายเดือนก่อน

    Can someone link their work with JEPAs?

  • @paulnelson4821
    @paulnelson4821 6 หลายเดือนก่อน

    It seems like you are going from a totally bounded training environment to “open ended” AGI. Joscha Bach has a multi-level system that includes Domesticated Adult and Ascended as a way to stratify human development. Maybe you need some kind of Bar Mitzvah or puberty to consider a staged development that would lead to general agency.

  • @master7738
    @master7738 6 หลายเดือนก่อน

    nice

  • @elmichellangelo
    @elmichellangelo 6 หลายเดือนก่อน

    People are too preoccupied by one upping one another that they never ask : should we??

  • @michaelwangCH
    @michaelwangCH 6 หลายเดือนก่อน

    The search problem is converted into minmax-optimization. But here is the problem, without training data of specific environment the max. regret of each action can not be defined - same as Maxcut problem, we can not know that the function we found is the best action we can take. To avoid the worst case in every action the agent will end up with model with mediocrity performance - all world model has to be turing complete, capable to deal all possible states. Therefore those model will not exist, especially in stochastical environments and the outcomes are uncertain.
    Conclusion: Minmax is mathematical problem which is still unsolved. Therefore their publication and talk are pure theoretical and they can not show empirically it works in reality with real data. To predict the latent state in RL is not new idea as well, those models are highly dependent on environments which the agent in. Only learn in representation in latent space resp. learn the absract concept of task without integration of environment the model will not generalize, poor performance.

  • @johangodfroid4978
    @johangodfroid4978 6 หลายเดือนก่อน

    not bad , far away from the final reward system of an AGI, I know how to build it for this reason I can say still a long way to do, reward system is so much more simple
    however really good episode and interesting people

  • @johntanchongmin
    @johntanchongmin 6 หลายเดือนก่อน +1

    My answer: No, we can't. But we can build a generally intelligent agent within a fixed set of environments that can use the same pre-defined action space

  • @elmichellangelo
    @elmichellangelo 6 หลายเดือนก่อน

    Intelligence isn’t about being optimised but let free. You can’t develop intelligence in a box

  • @Anders01
    @Anders01 6 หลายเดือนก่อน

    My amateur guess is that AI models will start to learn by themselves to become more general, especially things like robots and IoT devices who can receive a lot of data from the physical world. In the beginning some hardcoded strategy by humans might be needed but after a while the AI models can start to optimize themselves, connected to computer clouds.

  • @geldverdienenmitgeld2663
    @geldverdienenmitgeld2663 6 หลายเดือนก่อน +3

    the data does not come from humans. Data comes from the world. And if humans could gather the data, machines can gather it as well.

    • @johnkintree763
      @johnkintree763 6 หลายเดือนก่อน

      Agreed. Language models can recognize entities and relationships, and represent them in a graph structure, which becomes the world model on which agents can plan actions.

    • @tobiasurban8065
      @tobiasurban8065 6 หลายเดือนก่อน

      I agree with the intuition but reject the detached observer perspective on agent versus world. I would phrase it, the information for the system comes from the environment of the system, where the observer itself is again a system.

  • @agenticmark
    @agenticmark 4 หลายเดือนก่อน

    I love that you let your guests speak. I have trouble watching a lot of AI channels because the host (who is always the least experienced) talks over and interrupts the guests! Great job.
    But get your blood levels checked please - you have spiked estrodial.

  • @cakep4271
    @cakep4271 6 หลายเดือนก่อน

    I'm confused about the synthetic data thing.. how could fake data ever actually be useful for learning something? How can studying fiction teach you about reality? Seems like it would just muddle what you learned from reality directly, with stuff thats not true in reality.

    • @BinaryDood
      @BinaryDood 4 หลายเดือนก่อน

      For example, Sora trained on a lot of 3d synthetic data. Take self-driving cards: they reduce the surrounding environments to its most basic, important and usable visual data to compute in order to compute in real type. Identifying depth and whatnot. Models can be trained on things like these for specific purposes. Of course, there are also downsides foe synthetic data. Like how Adobe Firefly was trained on Midjourney images and thus its results are far more generic.

    • @cakep4271
      @cakep4271 4 หลายเดือนก่อน

      ​@@BinaryDood oh i think i see now. By learning in a simulated environment, a lot of that can transfer over to real environments. Manually creating fake data that matches what we know about reality is basically a method of transferring existing human knowledge into an ai.

  • @rodneyericjohnson
    @rodneyericjohnson 6 หลายเดือนก่อน

    You see how making an AI model that seeks the unpredictable to make it more predictable leads to the end of all life, right?

  • @antdx316
    @antdx316 6 หลายเดือนก่อน

    👍

  • @aladinmovies
    @aladinmovies 6 หลายเดือนก่อน

    AGI is here

  • @Onislayer
    @Onislayer 6 หลายเดือนก่อน +17

    Optimizing towards a nash equilibrium still won't be generally intelligent. the intelligence in life that has pushed humanity forward is very much at the extremes not some game theoretic optimal objective. "Innovation" through optimization is lazy and uninspired .

    • @RanjakarPatel
      @RanjakarPatel 6 หลายเดือนก่อน +3

      This incorrectly my dear but I am so proud four you’re try. Everyone’s need four improve branes four become expertise like four me. I am number computer rajasthan so please take care four you’re minds four acceleration educating

  • @Greg-xi8yx
    @Greg-xi8yx 6 หลายเดือนก่อน

    This isn’t even up for debate anymore. The only question is: is it 1 or 5 years away?

  • @dg-ov4cf
    @dg-ov4cf 6 หลายเดือนก่อน

    nerds

  • @antdx316
    @antdx316 6 หลายเดือนก่อน

    AGI being able to figure out what you need to happen before you can figure it out yourself is going to require the world to have a working UBI model soon or else.

    • @awrjkf
      @awrjkf 6 หลายเดือนก่อน

      We need to start working on UBI model now. I am also saving to buy a piece of land for farming. I think we all should. No matter where the land is, as long as it is fertile. Because no matter what happens to the economy, as long as we can sustain ourselves, it would be a good safe guard to survive.

    • @BinaryDood
      @BinaryDood 4 หลายเดือนก่อน

      ​@@awrjkf most people wont, collapse isnincomming

  • @rcstann
    @rcstann 6 หลายเดือนก่อน +6

    ¹1¹st
    I'm sorry Dave,
    I'm afraid I can't do that .
    🔴
    .