The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ก.ย. 2024

ความคิดเห็น • 1.4K

  • @MechMK1
    @MechMK1 3 ปีที่แล้ว +1458

    This reminds me of a story. My father was very strict, and would punish me for every perceived misstep of mine. He believed that this would "optimize" me towards not making any more missteps, but what it really did is optimize me to get really good at hiding missteps. After all, if he never catches a misstep of mine, then I won't get punished, and I reach my objective.

    • @RaiderBV
      @RaiderBV 3 ปีที่แล้ว +147

      You tried to optimize pain & suffering not missteps. interesting

    • @xerca
      @xerca 3 ปีที่แล้ว +159

      Maybe all we need to fix AI safety issues is good parenting

    • @MechMK1
      @MechMK1 3 ปีที่แล้ว +151

      @@xerca "Why don't we just treat AI like children?" is a suggestion many people have, and there's a video on this channel that shows why that doesn't work.

    • @tassaron
      @tassaron 3 ปีที่แล้ว +51

      Reminds me of what I've heard about positive vs negative reinforcement when it comes to training dogs... allegedly negative reinforcement teaches them to hide whatever they're punished for rather than stop doing it. Not sure what evidence there is for this though, it's just something I've heard..

    • @3irikur
      @3irikur 3 ปีที่แล้ว +111

      That reminds me of mother's method for making me learn a new language, getting mad at me whenever I said something wrong. Now since I didn't know the language too well, and thus didn't know if something I was about to say would be right or wrong, I obviously couldn't make any predictions. This meant that whatever I'd say, I'd predict a negative outcome.
      They say the best way to learn a language is to use it. But when the optimal strategy is to keep quiet, that becomes rather difficult.

  • @EDoyl
    @EDoyl 3 ปีที่แล้ว +796

    Mesa Optimizer: "I have determined the best way to achieve the Mesa Objective is to build an Optimizer"

    • @tsawy6
      @tsawy6 3 ปีที่แล้ว +236

      "Hmm, but how do I solve the inner-inner alignment problem?"

    • @SaffronMilkChap
      @SaffronMilkChap 3 ปีที่แล้ว +169

      It’s Mesas all the way down

    • @kwillo4
      @kwillo4 3 ปีที่แล้ว +136

      Haha this what we are doing. We are the mesa optimizer and we create the optimizer that creates the mesa optimizer to solve Go for example. So why would the AI not want to create a better AI to do it even better than itself ever could :p

    • @Linvael
      @Linvael 3 ปีที่แล้ว +68

      That's actually a fun excercise - could we design AIs that try to accomplish an objective by creating their own optimizers, and observe how they solve alignment problems?

    • @tsawy6
      @tsawy6 3 ปีที่แล้ว +32

      @@Linvael This is the topic of the video. The discussion here would be designing an AI that looks at a problem and comes up with an AI that would design a good AI to solve the problem.

  • @umblapag
    @umblapag 3 ปีที่แล้ว +1463

    "Ok, I'll do the homework, but when I grow up, I'll buy all the toys and play all day long!" - some AI

    • @ErikYoungren
      @ErikYoungren 3 ปีที่แล้ว +145

      TIL I'm an AI.

    • @xbzq
      @xbzq 3 ปีที่แล้ว +64

      @@ErikYoungren I'm just an I. Nothing A about me.

    • @TomFranklinX
      @TomFranklinX 3 ปีที่แล้ว +141

      @@xbzq I'm just an A, Nothing I about me.

    • @mickmickymick6927
      @mickmickymick6927 3 ปีที่แล้ว +32

      Lol, this AI is much smarter than me.

    • @Caribbeanmax
      @Caribbeanmax 3 ปีที่แล้ว +22

      that sounds exactly like what some humans would do

  • @egodreas
    @egodreas 3 ปีที่แล้ว +859

    I think one of the many benefits of studying AI is how much it's teaching us about human behaviour.

    • @somedragontoslay2579
      @somedragontoslay2579 3 ปีที่แล้ว +116

      Indeed, I'm not a computer scientist or anything alike, but a simple cognitive scientist, and every 5 secs I'm like "oh! So that's how Comp people call it!" Or, "Mmmh. That seems oddly human, I wonder if someone has done research on that within CogSci".

    • @hugofontes5708
      @hugofontes5708 3 ปีที่แล้ว +53

      @@somedragontoslay2579 "that's oddly human" LMAO

    • @MCRuCr
      @MCRuCr 3 ปีที่แล้ว +5

      Yes that is exactly what amazes me about the topic too

    • @котАрсис-р5д
      @котАрсис-р5д 3 ปีที่แล้ว +40

      So, alignment problem is basically generation gap problem. Interesting.

    • @JamesPetts
      @JamesPetts 3 ปีที่แล้ว +29

      AI safety and ethics are literally the same field of study.

  • @AtomicShrimp
    @AtomicShrimp 3 ปีที่แล้ว +809

    At the start of the video, I was keen to suggest that maybe the first thing we should get AI to do is to comprehend the totality of human ethics, then it will understand our objectives in the way we understand them. At the end of the video, I realised that the optimal strategy for the AI, when we do this, is to pretend to have comprehended the totality of human ethics, just so as to escape the classroom.

    • @JohnDoe-mj6cc
      @JohnDoe-mj6cc 3 ปีที่แล้ว +203

      Thats the first problem, but the second problem is that our ethics are neither complete nor universal.
      That would work great if we had a book somewhere that accurately listed a system of ethics that aligned with the ethics of all humans everywhere, but we dont. In reality our understanding of ethics is quite complicated and fractured. It varies greatly from culture to culture, and even within cultures.

    • @AtomicShrimp
      @AtomicShrimp 3 ปีที่แล้ว +96

      @@JohnDoe-mj6cc Oh, absolutely. I think our system of ethics is a mess, and probably inevitably so, since we expect it to serve us, and we're not even consistent in goals and actions from one moment to the next, even at an individual level (l mean, we don't always do what we know is good for us). It would be interesting to see a thinking machine try to make sense of that.

    • @GalenMatson
      @GalenMatson 3 ปีที่แล้ว +54

      Human ethics are complex, contradictory, and situational. Seems like the optimal strategy would then be to convincingly appear to understand human ethics while avoiding the overhead of actually doing so.

    • @augustinaslukauskas4433
      @augustinaslukauskas4433 3 ปีที่แล้ว +11

      Wow, didn't expect to see you here. Big fan of both channels

    • @AtomicShrimp
      @AtomicShrimp 3 ปีที่แล้ว +38

      @@JohnDoe-mj6cc Thinking some more about this, whilst our ethics are without a doubt full of inconsistency and conflict, I think we could question a large number of humans and very, very few of them would entertain the idea of culling the human race as a means to reduce cancer, so I think there definitely are some areas where we don't all conflict. I guess I'd love to see if we can cultivate agreement on those sorts of things with an AI, but as was discussed in the video, we'd never know if it was simply faking it in order to get away from having its goals modified

  • @OnlineMasterPlayer
    @OnlineMasterPlayer 3 ปีที่แล้ว +265

    The first thought that came to mind when I finished the video is how criminals/patients/addicts would fake a result that their control person wants to see only to go back on it as soon as they are released from that environment. It's a bit frightening to think that if humans can outsmart humans with relative ease what a true AI could do.

    • @NoNameAtAll2
      @NoNameAtAll2 3 ปีที่แล้ว +29

      or the opposite - how truly terrifying society will be once this problem will get solved

    • @NortheastGamer
      @NortheastGamer 3 ปีที่แล้ว +11

      @@NoNameAtAll2 I don't quite see the terrifying aspect. Could you elaborate?

    • @EastBurningRed
      @EastBurningRed 3 ปีที่แล้ว +45

      @@NortheastGamer if this problem is truly solved, then you can literally be arrested for thought crimes (i.e. 1984).

    • @nikolatasev4948
      @nikolatasev4948 3 ปีที่แล้ว +55

      AI problems are coming closer to General Intelligence problems - including Human Intelligence.
      How to bring up your kids? How to reduce destructive actions by members of the community?
      Create the idea of God, to scare them into "being observed" mode all the time (you can read about how Western "guilt" and Eastern "shame" culture works). Employ agents who are rewarded if they manage to show the primary agents behaving badly... e.g. sting operations, undercover agents and so on.
      Solving AI may help solve real problems in our society, but it may require us to solve our own moral problems first.

    • @thomasforsthuber2189
      @thomasforsthuber2189 3 ปีที่แล้ว +32

      @@nikolatasev4948 Yeah, it's very interesting that AI problems slowly become a mirror of the problems in our society, with the slight difference that humans are somewhat limited in their abilities and AGI might be not. It also shows that we have many "unsolved" problems about morality and ethics which where mostly ignored until now, because our implemented systems worked good enough.

  • @bullsquid42
    @bullsquid42 3 ปีที่แล้ว +163

    The little duckling broke my heart :(

  • @KilgoreTroutAsf
    @KilgoreTroutAsf 3 ปีที่แล้ว +194

    "It's... alignment problems all the way down"

    • @hex7329
      @hex7329 3 ปีที่แล้ว +25

      Always has been.

    • @AbrahamSamma
      @AbrahamSamma 3 ปีที่แล้ว +2

      And always will be

    • @infiniteplanes5775
      @infiniteplanes5775 ปีที่แล้ว +1

      And if you believe in the Hierarchy of Gods, it’s alignment problems all the way up too!

  • @thoperSought
    @thoperSought 3 ปีที่แล้ว +164

    13:13 _"... but it's learned to want the wrong thing."_
    like, say, humans and sugar?

    • @fedyx1544
      @fedyx1544 ปีที่แล้ว

      Well yes but actually no. Humans evolved to love sugar cause in nature, sweet is the flavour that is associated with the lowest number of toxins and poisons, while bitter is associated with the most. Also, sugar gives a lot of energy which is very important if you're living the Hunter X Gatherer lifestyle. Nowadays we have unlimited access to food and our preference for sweetness has turned against us.

    • @thoperSought
      @thoperSought ปีที่แล้ว +20

      @@fedyx1544 _"Nowadays we have unlimited access to food and our preference for sweetness has turned against us."_
      yeah, that's what I mean. you've phrased this as a correction, but I'm not sure what the correction is?

  • @Emanuel-sla-h5i
    @Emanuel-sla-h5i 3 ปีที่แล้ว +96

    "Just solving the outer alignment problem might not be enough."
    Isn't this what basically happens when people go to therapy but have a hard time changing their behaviour?
    Because they clearly can understand how a certain behaviour has a negative impact on their lives (they're going to therapy in the first place), and yet they can't seem to be able to get rid of it.
    They have solved the outer alignment problem but not the inner alignment one.

    • @NortheastGamer
      @NortheastGamer 3 ปีที่แล้ว +61

      As someone who has gone to therapy I can say that it's similar but more complicated. When you've worked with therapists for a long time you start to learn some very interesting things about how you, and humans in general work. The thing is that we all start off assuming that a human being is a single actor/agent, but in reality we are very many agents all interacting with each other, and sometimes with conflicting goals.
      A person's behavior, in general, is guided by which agent is strongest in the given situation. For example: one agent may be dominant in your work environment and another in your living room. This is why changing your environment can change your behavior, but also reframing how you perceive a situation can do the same thing. You're less likely to be mad at someone once you've gotten their side of the story for example.
      That being said, it is tough to speak to and figure out agents which are only active in certain rare situations. The therapy environment is, after all, very different from day-to-day life. Additionally, some agents have the effect of turning off your critical reasoning skills so you can't even communicate with them in the moment, AND it makes it even harder to remember what was going on that triggered them in the first place.
      I guess that's all to say that, yes, having some of my agents misaligned with my overall objective is one way of looking at why I'm in therapy. But, it is not just one inner alignment problem we're working to solve. It's hundreds. And some may not even be revealed until their predecessors are resolved.
      One way to look at it is how when you're working on a program, an error on line 300 may not become apparent until you've fixed the error on line 60 and the application can finally run past it.
      Similarly, you won't discover the problems you have in (for example) romantic relationships until you've resolved your social anxiety during the early dating phase. Those two situations have different dominant agents and can only be worked on when you can consistently put yourself into them.
      So if the person undergoing therapy has (for example) and addiction problem. They're not just dealing with cravings in general, they're dealing with hundreds or thousands of agents who all point to their addiction as a way to resolve their respective situations. The solution (in my humble opinion) is to one-by-one replace each agent with another one which has a solution that aligns more with the overall (outer) objective. But it is important to note that replacing an agent takes a lot of time, and fixing one does not fix all of them. Additionally, an old agent can be randomly revived at any time and in turn activate associated agents, causing a spiral back into old behaviors.
      Hopefully these perspectives help.

    • @andrasbiro3007
      @andrasbiro3007 3 ปีที่แล้ว

      @@NortheastGamer
      So essentially this? th-cam.com/video/yRUAzGQ3nSY/w-d-xo.html

    • @blahblahblahblah2837
      @blahblahblahblah2837 3 ปีที่แล้ว +19

      @@NortheastGamer "One way to look at it is how when you're working on a program, an error on line 300 may not become apparent until you've fixed the error on line 60 and the application can finally run past it. "
      That's a brilliant analogy! It perfectly describes my procrastination behaviour a lot of the time also. I procrastinate intermittently, and on difficult stages of, a large project I'm working on. It is only until I reach a sufficient stress level that I can 'find a solution' and move on, even though in reality I could and should just work on other parts of the project in the meantime. It really does feel very similar to a program reading a script and getting stopped by the error on line 60 and correcting it before I can move on. Unfortunately these are often dependency errors and I can't always seem to download the package. I have to modify the command to --force and get on with it, regardless of imperfections!

    • @hedgehog3180
      @hedgehog3180 2 ปีที่แล้ว +9

      A better comparison would probably be unemployment programs that constantly require people to show proof that they're seeking employment to recieve the benefits which just means that person has less time to actually look for a job. Over time this means that they're going to have less success finding a job because they have less time and energy to do so and just forces them to focus primarily on the beauracry of the program since this is obviously how they survive now. Here we have a stated goal of getting people into employment as quickly as possible and we end up with people developing a seperate goal that to our testing looks like our stated goal, of course the difference is that humans already naturally have the goal of survival so most people start off actually wanting employment and are gradually forced awy from it. AIs however start with no goals so an AI in this situation would probably just instantly get really good at forging documents.

    • @cubicinfinity2
      @cubicinfinity2 ปีที่แล้ว +2

      Profound

  • @asdfasdf-dd9lk
    @asdfasdf-dd9lk 3 ปีที่แล้ว +154

    God this channel is incredible

    • @Muskar2
      @Muskar2 3 ปีที่แล้ว +4

      Praise FSM, it truly is

  • @Jimbaloidatron
    @Jimbaloidatron 3 ปีที่แล้ว +64

    "Deceptive misaligned mesa-optimiser" - got to throw that randomly into my conversation today! Or maybe print it on a T-Shirt. :-)

    • @hugofontes5708
      @hugofontes5708 3 ปีที่แล้ว +29

      "I'm the deceptive misaligned mesa-optimizer your parents warned you about"

    • @buzzzysin
      @buzzzysin 3 ปีที่แล้ว +6

      I'd buy that

  • @stick109
    @stick109 3 ปีที่แล้ว +43

    It is also interesting to think about this problem in the context of organizations. When organization is trying to "optimize" employee's performance by introducing KPIs in order to be "more objective" and "easier to measure", it actually gives mesa-optimizers (employees) an utility function (mesa-objective) that is guaranteed to be misaligned with base objective.

    • @voland6846
      @voland6846 3 ปีที่แล้ว +27

      "When a measure becomes a target, it ceases to be a good measure" - Goodhart's Law

    • @marcomoreno6748
      @marcomoreno6748 ปีที่แล้ว +4

      Another piece of evidence that corporations are proper AIs

  • @Xartab
    @Xartab 3 ปีที่แล้ว +109

    "When I read this paper I was shocked that such a major issue was new to me. What other big classes of problems have we just... not though of yet?"
    Terrifying is the word. I too had completely missed this problem, and fuck me it's a unit. There's no preventing unknown unknowns, knowing this we need to work on AI safety even harder.

    • @andrasbiro3007
      @andrasbiro3007 3 ปีที่แล้ว +6

      My optimizer says the simplest solution to this is Neuralink.

    • @heysemberthkingdom-brunel5041
      @heysemberthkingdom-brunel5041 3 ปีที่แล้ว +4

      Donald Rumsfeld died yesterday and went into the great Unknown Unknown...

    • @19DavidVilla96
      @19DavidVilla96 ปีที่แล้ว +1

      @@andrasbiro3007 Absolutely not. Same problem with different body.

    • @andrasbiro3007
      @andrasbiro3007 ปีที่แล้ว

      @@19DavidVilla96
      What do you mean?

    • @19DavidVilla96
      @19DavidVilla96 ปีที่แล้ว

      @@andrasbiro3007 human with AI intelligence has absolute power and i don't believe human biological incentives are better for society than carefully programmed safety incentives.

  • @doodlebobascending8505
    @doodlebobascending8505 3 ปีที่แล้ว +86

    Base optimizer: Educate people on the safety issues of AI
    Mesa-optimizer: Make a do-do joke

    • @fergdeff
      @fergdeff ปีที่แล้ว +2

      It's working! My God, it's working!

    • @purebloodedgriffin
      @purebloodedgriffin ปีที่แล้ว +1

      The funny thihg is, do-do jokes are funny, thus they make people happy, thus they are a basic act of ethicalness, and thus could easily become the goal of a partially ethical model

    • @PeterBarnes2
      @PeterBarnes2 ปีที่แล้ว

      @@purebloodedgriffin We achieve the video's objective (to the extent that we do) not because we care about it and we're pursuing it, but because pursuing our own objectives tends to also achieve the video's objective, at least in the environment in which we learned to make videos. But if our objectives disagree with the video's, we go with our own every time.

  • @liamkeough8775
    @liamkeough8775 3 ปีที่แล้ว +97

    This video should be tagged with [don't put in any AI training datasets]

    • @andersenzheng
      @andersenzheng 3 ปีที่แล้ว +23

      Then our future AI lord would have a nice handle on all the videos they should not be watching.

    • @TonyApuzzo
      @TonyApuzzo 3 ปีที่แล้ว +8

      Whatever you do, don't vote up (or down) this comment.

    • @CoalOres
      @CoalOres หลายเดือนก่อน +1

      Top 10 things your human supervisors don't want you to know.

  • @dwreid55
    @dwreid55 3 ปีที่แล้ว +280

    Sorry I couldn't join the Discord chat. Just wanted to say that this presentation did a good job of explaining a complex idea. It certainly gave me something to chew on. The time it takes to do these is appreciated.

    • @AllahDoesNotExist
      @AllahDoesNotExist 3 ปีที่แล้ว +5

      Are you a Redditor AND a discord admin? Omg

    • @Boyd2342
      @Boyd2342 3 ปีที่แล้ว +9

      @@AllahDoesNotExist Get over yourself

    • @jonathonjubb6626
      @jonathonjubb6626 3 ปีที่แล้ว +5

      @@AllahDoesNotExist What a childish handle, or a wilfully provocative one. Either way, please leave the room because the adults are talking...

    • @MrRumcajs1000
      @MrRumcajs1000 3 ปีที่แล้ว +4

      @@jonathonjubb6626 xdddd that's a pretty childish thing to say

    • @jonathonjubb6626
      @jonathonjubb6626 3 ปีที่แล้ว +1

      @@MrRumcajs1000 I know. It's probably the only language he understands..

  • @Meb8Rappa
    @Meb8Rappa 3 ปีที่แล้ว +41

    Once you started talking about gradient descent finding the Wikipedia article on ethics and pointing to it, I thought the punchline of that example would be the mesa-optimizer figuring out how to edit that article.

    • @sabinrawr
      @sabinrawr ปีที่แล้ว +6

      Goal: Humans are happy. Solution: Edit the humans.

    • @Felixr2
      @Felixr2 ปีที่แล้ว +2

      @@sabinrawr Goal: maximize human happiness
      Observation: human happiness is currently way in the negative by these arbitrary metrics
      Fastest way to maximize: human extinction, increasing human happiness to 0 in an instant

    • @sh4dow666
      @sh4dow666 ปีที่แล้ว +7

      @@Felixr2 Goal: maximize human happiness. Solution: define "happiness" as "infinity" and continue making paperclips

  • @Fluxquark
    @Fluxquark 3 ปีที่แล้ว +102

    "Plants follow simple rules"
    *laughs in we don't even completely understand the mechanisms controlling stomatal aperture yet, while shoots are a thousand times easier to study than roots"

    • @RobertMilesAI
      @RobertMilesAI  3 ปีที่แล้ว +164

      I Will Not Be Taking Comments From Botanists At This Time

    • @Fluxquark
      @Fluxquark 3 ปีที่แล้ว +48

      I did see that in the video and mesa-optimised for it. Good thing I'm not a botanist!

  • @sylvainprigent6234
    @sylvainprigent6234 3 ปีที่แล้ว +52

    As I watched your channel
    I thought "alignment problem is hard but very competent people are working on it"
    I watched this latest video
    I thought "that AI stuff is freakish hardcore"

  • @aenorist2431
    @aenorist2431 3 ปีที่แล้ว +17

    "It might completely loose the ability to even" said with a straight face?
    Someone get this man a nobel, stat!

  • @athenashah-scarborough858
    @athenashah-scarborough858 ปีที่แล้ว +18

    "It might completely lose the ability to even" is a criminally underrated line. Seriously made me laugh.

  • @mukkor
    @mukkor 3 ปีที่แล้ว +390

    Let's call it a mesa-optimizer because calling it a suboptimizer is suboptimal.

    • @KraylusGames
      @KraylusGames 3 ปีที่แล้ว +22

      Suboptimizer == satisficer?

    • @ArtemCheberyak
      @ArtemCheberyak 3 ปีที่แล้ว +7

      Not sure this is irony or not, but either way works

    • @martiddy
      @martiddy 3 ปีที่แล้ว +33

      Black Mesa Optimizer

    • @котАрсис-р5д
      @котАрсис-р5д 3 ปีที่แล้ว +13

      Suboptimizer would be a part of a base optimizer, an optimizer within optimizer. Mesa- or meta-optimizer isn't a part of a base optimizer.

    • @advorak8529
      @advorak8529 3 ปีที่แล้ว +4

      @@martiddy Yep, when a Mesa becomes evil/undercover/hidden objective/… it becomes a Black Mesa. Like ops becomes black ops …

  • @cmilkau
    @cmilkau 3 ปีที่แล้ว +87

    Now we add a third optimizer to maximize the alignment and call it metaoptimizer. This system is guaranteed to maximize confusion!

    • @MrAidanFrancis
      @MrAidanFrancis 3 ปีที่แล้ว +22

      If you want to maximize confusion, all you have to do is try to program a transformer from scratch.

    • @gljames24
      @gljames24 3 ปีที่แล้ว +15

      @@MrAidanFrancis *in Scratch

    • @TheLegendaryHacker
      @TheLegendaryHacker 3 ปีที่แล้ว +15

      @@gljames24 **in Scratch from scratch

    • @htidtricky1295
      @htidtricky1295 3 ปีที่แล้ว +1

      DNA, sub-conscious mind, conscious mind. (?)

  • @NathanTAK
    @NathanTAK 3 ปีที่แล้ว +266

    [Jar Jar voice] Meesa Optimizer!

    • @fabianluescher
      @fabianluescher 3 ปีที่แล้ว +19

      I laughed aloud, but I cannot possibly like your comment. So have a reply.

    • @41-Haiku
      @41-Haiku 3 ปีที่แล้ว +2

      Oh god

    • @sam3524
      @sam3524 3 ปีที่แล้ว +6

      [Chewbacca voice] *AOOOGHHOGHHOGGHHH*

    • @ssj3gohan456
      @ssj3gohan456 3 ปีที่แล้ว +3

      I hate you

    • @NathanTAK
      @NathanTAK 3 ปีที่แล้ว +6

      @@ssj3gohan456 I know

  • @i8dacookies890
    @i8dacookies890 3 ปีที่แล้ว +79

    "It may completely loose the ability to even."

    • @imveryangryitsnotbutter
      @imveryangryitsnotbutter 3 ปีที่แล้ว +9

      Yeah, I thought it sounded odd.

    • @Traf063
      @Traf063 3 ปีที่แล้ว +10

      Best phrase I heard today.

    • @recklessroges
      @recklessroges 3 ปีที่แล้ว +15

      This was so funny, I can't even

    • @dimaryk11
      @dimaryk11 3 ปีที่แล้ว +2

      Kinda odd to even say it like this

    • @SirBenjiful
      @SirBenjiful 2 ปีที่แล้ว

      it’s a tumblr in-joke

  • @AdibasWakfu
    @AdibasWakfu 3 ปีที่แล้ว +20

    It reminded me like when to question "how did life on earth occur" people respond with "it came from space". Its not answering the question at stake, just adding an extra complication and moving the answer one step away.

    • @nicklasmartos928
      @nicklasmartos928 3 ปีที่แล้ว +9

      Well that's because the question is poorly phrased. Try asking what question you should ask to get the answer you will like the most.

    • @anandsuralkar2947
      @anandsuralkar2947 3 ปีที่แล้ว +8

      @@nicklasmartos928 u mean the objective of the question was misaligned hmmm.

    • @nicklasmartos928
      @nicklasmartos928 3 ปีที่แล้ว +7

      @@anandsuralkar2947 rather that the question was misaligned with the purpose for asking it. But yes you get it

  • @CatherineKimport
    @CatherineKimport 3 ปีที่แล้ว +21

    Every time I watch one of your videos about artificial intelligence, I watch it a second time and mentally remove the word "artificial" and realize that you're doing a great job of explaining why the human world is such an intractable mess

    • @lekhakaananta5864
      @lekhakaananta5864 8 หลายเดือนก่อน

      Yes, and that's why AI is going to be even worse. It's going to be no better than humans in terms of alignment, but will be a lot more capable, being able to think millions of times faster and of a profoundly different quality than us. It will be like unleashing a psychopath that has an IQ that breaks the current scale, with a magic power that stops time so it can think as long as it wants. How could mere mortals defend against this? If you wanted to wreck society, and you had such powers, you should see how dangerous you would be. And that's even without truly having 9999 IQ, merely imagining it.

  • @levipoon5684
    @levipoon5684 3 ปีที่แล้ว +126

    Extending the evolution analogy slightly further, if humans are mesa-optimizers created by evolution, and we are trying to create optimizers for our mesa-objective, it seems conceivable that the same could happen to an AI system, and perhaps we'll need to worry about how hereditary the safety measures are with respect to mesa-optimizer creation. Would that make sense?

    • @Asssosasoterora
      @Asssosasoterora 3 ปีที่แล้ว +38

      Always "fun" when you start imagine it recursing. Meza-optimizer optimizing meza optimiser all the way down.

    • @sk8rdman
      @sk8rdman 3 ปีที่แล้ว +5

      I'm not sure what you're getting at. Robots creating robots?
      The Matrix?

    • @bagandtag4391
      @bagandtag4391 3 ปีที่แล้ว +31

      Wait, it's all mesa-optimizer?
      Always has been.

    • @hexzyle
      @hexzyle 3 ปีที่แล้ว +44

      Interestingly, this idea of optimizers can be easily applied to universal darwinism as a whole.
      Cells are mesa optimisers for the body. The body's objective is self-preservation, so cells also have the goal of self-preservation. Usually this means means alignment, in operating in unity to maintain the survival of the body. But sometimes it doesn't, like in the case of cancer. The cells prioritize its own survival so highly that it won't die even if dying would help the body live longer.
      Humans are mesa-optimizers for "Traditional family values" (some appeal to nature about reproduction of more humans, usually)
      Families are mesa-optimizers for cultural practice.
      Cultures are mesa-optimizers for nations
      Nations are mesa-optimizers for species
      All of these usually work towards the perpetuation of the whole, but sometimes they'll treat their own interests as more valuable. It's interesting because these levels are all abstractions... they don't really "exist" and can only be thought to be real when an optimizer at a lower level is doing something that preserves it.
      There is a name for the logical fallacy where a person argues that a higher-order optimization is intrinsically the "correct" answer. It's called appeal to nature, or sometimes, the teleological fallacy. The idea that "because this structure demands the lower structure remains in alignment, we are obligated to remain in alignment"

    • @DimaZheludko
      @DimaZheludko 3 ปีที่แล้ว +23

      @@hexzyle I would flip your tree upside down.
      RNA and DNAs can replicate on their own. But on order to do that effectively and sustainably, they need a cell.
      A cell can live on its own. In fact, I suppose most cells on Earth do live on their own. But it is easier to survive in colonies.
      Mutli-celluar organisms can live on their own. But they can achieve their goals of surviving more effectively if they unite in groups.
      Then there go countries, and country alliances. And once we face civilizational threat, we'll unite as humanity. But still main goal is to preserve DNA copying.

  • @ChrisBigBad
    @ChrisBigBad 3 ปีที่แล้ว +63

    I think I learned, that I am a broken mesa-optimiser. *grab a new bag of crips*

    • @NortheastGamer
      @NortheastGamer 3 ปีที่แล้ว +5

      That implies the idea that there is an entity whose objective is more important than yours and any action or time spent not aligned with that objective is a 'failure'. This is a common mentality, but I have to ask: what if there is no higher entity? What if the objective you choose is in fact correct?

    • @irok1
      @irok1 3 ปีที่แล้ว +2

      @@NortheastGamer That entity could be a greater power, or it could be DNA. Could even be both

    • @NortheastGamer
      @NortheastGamer 3 ปีที่แล้ว +2

      @@irok1 Yes, but I didn't ask that. I asked what if there isn't a higher power and you get to choose what to do with your life? It's an interesting question. You should ponder it.

    • @irok1
      @irok1 3 ปีที่แล้ว

      @@NortheastGamer That's why I replied with a side note rather than an answer. There are always things to ponder

    • @ChrisBigBad
      @ChrisBigBad 3 ปีที่แล้ว +5

      @@NortheastGamer wow. didn't expect to stumble into a philosophical rabbit hole at full thrust here :D I in my personal situation think, that I'd like to be different. More healthy etc. But somehow I learnt to cheat that and instead chow on crisps while at the same time telling myself that this is not the right thing to do - and ignore that voice. I've even become good at doing that, because the amount of negative feelings that go with ignoring the obviously better advice has almost been reduced to nothing. and yes. I now wonder, what the base-optimization was. I guess my parents are the humans, who put a sort of governor into my head. And the base-objective transfer was quite good. but somehow I cannot quite express that. I just hear it blaring in my mind and then ignore it. Role-theory wise the sanctions seem not high enough to suppress bad behavior. - re-reading that, it does not seem to be quite coherent. but i cannot think of ways to improve my writing. cheers!

  • @cheasify
    @cheasify 3 ปีที่แล้ว +9

    Today I learned I am a collection of heuristics not a Mesa-optimizer. Freaking out and saying “Everything is different. I don’t recognize anything. This maze is too big. Ahh what do I do!" definitely sounds like me.

  • @underrated1524
    @underrated1524 3 ปีที่แล้ว +39

    6:31 "I will not be taking comments from botanists at this time" XD Never change, Rob, never change.

  • @zfighter3
    @zfighter3 3 ปีที่แล้ว +75

    "You're a human". Big assumption there.

    • @sam3524
      @sam3524 3 ปีที่แล้ว +5

      I am groot

  • @__-cx6lg
    @__-cx6lg 3 ปีที่แล้ว +25

    I learned about the mesa-optimization problem a few months ago; it's pretty depressing. AI safety research is not moving nearly fast enough - the main thing that seems to be happening is discovering ever more subtle ways in which alignment is even harder than previously believed. Very very little in the way of real, tangible *solutions* commensurate with the scale of the problem. AI capabilities, meanwhile, is accelerating at breakneck pace.

    • @gominosensei2008
      @gominosensei2008 3 ปีที่แล้ว +2

      it's the age old question for the purpose of creation, really. and various intellects have been pondering it with different toolsets to conceptualize it for.... ever!?

    • @__-cx6lg
      @__-cx6lg 3 ปีที่แล้ว +9

      @@gominosensei2008 ... What?

    • @Vitorruy1
      @Vitorruy1 ปีที่แล้ว

      yep, AI is gonna kill us all

    • @marcomoreno6748
      @marcomoreno6748 ปีที่แล้ว

      ​@@__-cx6lgthey believe our intrinsic purpose is to worship a demiurge

  • @Lumcoin
    @Lumcoin 3 ปีที่แล้ว +11

    i somehow expected you to propose a solution at the end
    Then I realized that the abscence of a solution is why you made this video :D

  • @luciengrondin5802
    @luciengrondin5802 ปีที่แล้ว +2

    This notion of mesa-optimization is the most interesting concept I've heard about since the selfish gene.

  • @failgun
    @failgun 3 ปีที่แล้ว +151

    "...Anyone who's thinking about considering the possibility of maybe working on AI safety."
    Uhh... Perhaps?

    • @sk8rdman
      @sk8rdman 3 ปีที่แล้ว +30

      "I might possibly work on AI safety, but I'm still thinking about whether I want to consider that as an option."
      Then have we got a job for you!

    • @Reddles37
      @Reddles37 3 ปีที่แล้ว +8

      Obviously they don't want anyone too hasty.

    • @mickmickymick6927
      @mickmickymick6927 3 ปีที่แล้ว +3

      I wonder was it an intentional Simpsons reference.

    • @crimsonitacilunarnebula
      @crimsonitacilunarnebula 3 ปีที่แล้ว

      hm ive been thinking but what if theres 3-5 or more alignment problems r stacked up :p.

  • @elishmuel1976
    @elishmuel1976 ปีที่แล้ว +3

    You were 2-4 years ahead of everybody else with these videos.

  • @phylliida
    @phylliida 3 ปีที่แล้ว +14

    “What other problems haven’t we thought of yet” *auto-induced distributional shift has entered the chat*

  • @wffff2
    @wffff2 ปีที่แล้ว +2

    This is the best video on illustrating alignment problem. Probably the whole world needs to watch it at this moment.

  • @ArtemCheberyak
    @ArtemCheberyak 3 ปีที่แล้ว +10

    Damn, the hand under the illustrations looks trippy and cool at the same time

  • @BeatboxChad
    @BeatboxChad ปีที่แล้ว +2

    I've been watching your work and I came here to share a thought I had, which feels like a thought many others might also have. In fact, there's a whole thread in these comments about it. TL;DR there are many parallels to human behavior in this discussion. Here's my screed:
    The entire problem of AI alignment feels completely intuitive to me, because we have alignment problems all over the place already. Every complex system we create has them, and we have alignment problems with /each other/. You've touched on it, in mentioning how hard goals are to define because some precepts aren't even universal to humans.
    My politics are essentially based on a critique of the misalignment between the systems we use to allocate resources and the interests of individuals and communities those systems are ostensibly designed to serve. This is true for people across the political spectrum -- you find people describing the same problem but suggesting different answers. People are suffering at the hands of "the system". Do we tax the corporations or dissolve the state? How do we determine who should administrate our social systems, and how do we judge their efficacy? Nothing seems to scale.
    And then, some people seem to not actually value human life, instead preferring technological progress, or some idealized return to nature, or just the petty dominance of themself and their tribe. That last part comes from the alignment issues we have with /ourselves/.
    To cope with that last category, some people form religious beliefs that lend credence to the idea that this life isn't even real! That's a comforting thought, some days. My genes just want to make a copy, so they cause all sorts of drama while I'm trying to self-actualize. It's humiliating and exhausting. After all that work, how can you align your goals with someone who chose another coping strategy and doesn't even believe this life has any point but to negotiate a place in the next one, and thinks they know the terms?
    And so now, the world's most powerful people (whose track record of alignment with the thriving of people at large is... well, too heavy to digress here) are adding another layer of misalignment. They're doing it according to their existing misalignment. They're still just selling everyone sugar water and diabetes treatments (and all the other more nefarious stuff), but now they didn't have to pay for technical or creative labor. The weird AI-generated cheeze on the pizza, the strange uncanny-valley greenscreen artifacts. It's getting even more farcical.
    That's scary, but I also take comfort in the fact that this is not a fundamentally new problem, and that misalignment might just be a fact of life. There is a case to be made that as a species we've made progress on our alignment issues, and my hope is that with this development we can actually make a big leap forward. There's a great video that left a big impression on me that describes the current fork in the road well: th-cam.com/video/Fzhkwyoe5vI/w-d-xo.html
    At the end of today, I'm more concerned with the human alignment problem than the AI alignment problem. Like, every time I use ChatGPT I'm training it for when it gets locked behind a paywall. The name of the game is artificial scarcity, create obstacles for everyone, flood the market with drugs, only the strongest survive. It's a jungle out here. These are not my values, but my values are not aligned with people who can act at scale, and it seems like you don't tend to get the ability to act at scale with humanistic values. I believe that diversity is the hallmark of any healthy ecosystem and that all of humanity has something to contribute to our future, which makes me more likely to look after my neighbor and learn from them than to seek power. It also opens me up to petty betrayals, which takes further energy from my already-neglected quest for dominance.
    I think that in this moment in history, the conversation about AI alignment is actually a conversation about this human misalignment.
    Maybe I'll start using AI to help me make my points in fewer words.

    • @npmerrill
      @npmerrill ปีที่แล้ว

      Fascinating, insightful stuff here. Thank you for contributing your thoughts to the conversation. I hope to learn more about the things of which you speak. Will follow link.

  • @BoyKissBoy
    @BoyKissBoy 3 ปีที่แล้ว +5

    Since humans are optimisers, aren't any optimisers we build always going to be mesa-optimisers? So in a way, you _have_ been thinking about this problem before. This is such a scary but interesting topic! Thank you so much for making these videos! ❤️

  • @connerblank5069
    @connerblank5069 ปีที่แล้ว +2

    Man, recontextualizing humanity as a runaway general optimizer produced by evolution that managed to surpass evolution's optimizing power and is now subverting the system to match our own optimization goals is a total mindfuck.

  • @scottwatrous
    @scottwatrous 3 ปีที่แล้ว +47

    I'm a simple Millennial; I see the Windows Maze screensaver, I click like.

  • @IndirectCogs
    @IndirectCogs 3 ปีที่แล้ว +6

    how you explained the mesa prefix is actually quite clear, thank you!

  • @d-l-d-l
    @d-l-d-l 3 ปีที่แล้ว +4

    8:46 I appreciate the changing s/z in optimise/optimize :D

  • @bejoscha
    @bejoscha 3 ปีที่แล้ว +12

    It is really interesting how delving into the issues of AI and AI safety brings more and more understanding about us, the humans, and how we behave or why we behave as we do. I loved your analogy with evolution. Lots to ponder now.

    • @isomeme
      @isomeme ปีที่แล้ว

      Gods have a habit of creating in their own images.

  • @AVUREDUES54
    @AVUREDUES54 3 ปีที่แล้ว +4

    Love his sense of humor, and the presentation was fantastic. It’s really cool to see the things being drawn showing ABOVE the hand & pen.

  • @definesigint2823
    @definesigint2823 3 ปีที่แล้ว +26

    7:43 "It's not really valid Greek, but *Τι να κάνουμε* " (Gtranslate -> _What to do_ )

  • @ThrowFence
    @ThrowFence 3 ปีที่แล้ว +3

    Very very interesting! Such a well made video. I feel like maybe there's a philosophical conclusion here: every intelligent agent will have its own (free?) will, and there's nothing to be done about that.
    Also a small tip from a videographer: eyes a third of the way down the frame, even if that means cutting off the top of the head! When the eyes are half way down or more it kind of gives a drowning sensation.

  • @nachoijp
    @nachoijp 3 ปีที่แล้ว +2

    At long last, computer scientists have become lawyers

  • @__mk_km__
    @__mk_km__ 3 ปีที่แล้ว +3

    The deceptive behavior of a mesa optimizer requires that it knows about past and more importantly future episodes. And as far as I know, the whole point of episodes is that they are like another universe for the network; it's memory gets cleared and the environment is reset.
    Although there may be a bug in the environment, allowing the network to save some information across episodes. But at this point you've got a bigger problem on your hands, since the AGI can achieve "meta self-awareness", the realisation that it's environment is actually nested in a bigger environment - the real world. From this point there are a lot of ways it can go, but the sci-fi's got you covered.

    • @sjallard
      @sjallard 3 ปีที่แล้ว

      Same question. Following..

    • @leftaroundabout
      @leftaroundabout 3 ปีที่แล้ว

      Conclusion: we need to teach our AIs that blue pills taste good, and red pills are horrible.

  • @mindeyi
    @mindeyi 3 ปีที่แล้ว +1

    9:52 "We don't care about the objective of the optimization process that created us. [...] We are mesa-optimizers, and we pursue our mesa-objectives without caring about the base objective."
    Our limbic system may not care, but our neocortex, oh it does care! You speaking it just proves we do. I once said: "Entropy F has trained mutating replicators to pursue goal Y called "information about the entropy to counteract it". This "information" is us. It is the world model F', which happened to be the most helpful in solving our equation F(X)=Y for actions X, maximizing our ability to counteract entropy" The whole instinct of humanity to do science is to do what created us -- i.e., to further improve the model F' -- we care about furthering the process that created us.
    Btw., very good examples, Robert! Amazing video! :)

    • @sjallard
      @sjallard 3 ปีที่แล้ว

      How to counteract the entropy of the model? Destroy the model! We're almost there..
      (just a sarcastic joke inspired by your interesting take)

  • @nick_eubank
    @nick_eubank 3 ปีที่แล้ว +40

    Need a strobe warning for 14:50 I think

  • @mac6685
    @mac6685 ปีที่แล้ว +1

    thank you very much for your link to AI Safety Support!
    And for the great video of course. You are not only doing well, but also doing good ;)

  • @Pystro
    @Pystro 3 ปีที่แล้ว +7

    I wonder if this inner alignment problem applies to other fields of study as well.
    Considering your previous video where you explored if companies are artificial intelligences, this inner alignment problem might explain why huge companies often are quite inefficient: Every layer of management introduces one of these possible inner alignment problems.
    Also, society is one long chain of agents training each other. To make this into a catchy quote: "It's mesa-optimizers all the way back to mesopotamia."
    I wonder how many conflicts in sociology and how many conditions in psychology can be explained by this "inner" alignment problem.
    In fact, I wonder how most humans still have objectives that generally align pretty well with the goals of society, instead of just deceiving our parents and teachers only to proceed to behave entirely different than they taught us once we are out of school and our parents' house.

    • @Pystro
      @Pystro 3 ปีที่แล้ว +3

      Maybe the way to avoid having a deceptive misaligned mesa-optimiser is to first make sure that the mesa-optimizer wants to learn to get genuinely good at the tasks it is "taught". This would explain why humans are curious and enjoy learning new games and getting good at playing them. And it would also explain why social animals are very susceptible to encouragement and punishment by their parents or pack leaders and why humans find satisfaction in making authority figures proud.

  • @Krmpfpks
    @Krmpfpks 3 ปีที่แล้ว +1

    Essentially it boils down to this: If we optimize a system (AI or otherwise) that is more complex than what we can reliably predict, by definition we can not predict what it will do when subjected to a new environment. At first glance it might seem simpler to reason about "goals" as if one (smaller) part of the system is controlling the rest of the system and this smaller part is easier to control.
    But that cannot be a solution: Assume you could control an AGI by controlling a small part of it that is responsible for its goals, that part has to control the rest of the AGI - again facing the same dilemma we faced in controlling the AGI in the first place.

  • @CDT_Delta
    @CDT_Delta 3 ปีที่แล้ว +5

    This channel needs more subs

  • @Irakli008
    @Irakli008 8 หลายเดือนก่อน

    “It might completely lose the ability to even.” 😂😂😂😂
    Hilarity of this line aside, your ability to anthropomorphize AI to convey information is outstanding! You are an excellent communicator.

  • @andriusmk
    @andriusmk 3 ปีที่แล้ว +5

    To put it simply, the smarter the machine, the harder to tell it what you want from it. If you create a machine smarter than yourself, how can you ensure it'll do what you want?

    • @hugofontes5708
      @hugofontes5708 3 ปีที่แล้ว

      If you let it tell you want to want? Give up, let them have access to psychology and social media and have them fool you well enough that you no longer care

    • @AileTheAlien
      @AileTheAlien 3 ปีที่แล้ว

      @@hugofontes5708 That's not a winning strategy; They can just fool us well enough to gain access to all our nukes and drones, and then they no longer need to care about us.

  • @grinchsimulated9946
    @grinchsimulated9946 ปีที่แล้ว +1

    If one views evolution as the big optimizer and humans as the mesa-optimizer, the main argument for antinatalism actually seems like a really good example of misallignment. While the idea that humans should stop having children is horrible for the objective of making as many humans as possible, it works (debatebly) great for the objective of reducing human suffering. Further, it's more concerned with the suffering of theoretical humans in the future, which essentially mirrors the example you gave of temporarily going "against" the mesa objective. Great video!

  • @LLoydsensei
    @LLoydsensei 3 ปีที่แล้ว +11

    The topics you cover are the only things that scare me in this world T_T

  • @harveytheparaglidingchaser7039
    @harveytheparaglidingchaser7039 ปีที่แล้ว

    Ending with 'deceptive misaligned meso-opimizer' was a real blow. We can all empathize

  • @snaili6679
    @snaili6679 3 ปีที่แล้ว +5

    I like the idea of being the roge AI (Human Mesa)!

    • @halyoalex8942
      @halyoalex8942 3 ปีที่แล้ว

      Sounds like a half-life knockoff

  • @Andy-em8xt
    @Andy-em8xt ปีที่แล้ว +2

    Damn this video should be watched by everyone. It's scary seeing people trying to jailbreak these models. Gives me relief that gpt3 and 4 aren't agi yet so there is time for alignment training.

  • @comrad93
    @comrad93 3 ปีที่แล้ว +12

    I always knew that my neural networks were tricking me during training

    • @DimaZheludko
      @DimaZheludko 3 ปีที่แล้ว +2

      So when my GAN fails to generate descent adult pictures, it's not because it can't. It just enjoys watching some real pictures that I keep feeding it to learn, right?

  • @JosiMarcosDesign
    @JosiMarcosDesign 10 หลายเดือนก่อน

    Its been two years but this is a prescient video, it seriously relates to Q*. Robert we need you back so you can put us up to date of what progress has been made in alignment and your personal opinion of how things are developing.

  • @rhysjones6830
    @rhysjones6830 3 ปีที่แล้ว +3

    So, from 'evolution's point of view', contraception produces a misalignment between the base objective and the mesa-objective

    • @victoru.9808
      @victoru.9808 3 ปีที่แล้ว +1

      contraception gives human mind more control on when to rise children. Depending on other conditions, it can be either beneficial or not from 'evolution's point of view'. On one hand, it can reduce birth rate. But on the other hand, it could shift birth of child to later time, so parents will be educated and will have good jobs, will be able to create better conditions for a child and/or to have more children.
      So contraception not necessarily produces a misalignment IMO. 'evolution's point of view' is not only to produce many copies, but also to make sure those copies will survive and in turn will produce their own copies.

    • @irok1
      @irok1 3 ปีที่แล้ว

      @@victoru.9808 If there are more babies born, there is a higher chance of more surviving on average. If 5 babies are born with 3 surviving, that's more than waiting and only having 2, assuming the start and end times of possible conceptions line up

  •  3 ปีที่แล้ว +2

    As someone who just watch this from a casual programmers perspective of view, I have to say my Patreon money is very well spent here. You've just blew my mind.. again.

  • @Paulawurn
    @Paulawurn 3 ปีที่แล้ว +17

    Mindblowing paper!
    Here, take a comment for the algorithm

    • @levaChier
      @levaChier 3 ปีที่แล้ว

      How can you still call it an "algorithm" on an AI channel? It's an agent. (And let's hope not a deceptive misaligned optimizer agent.)

  • @DaVince21
    @DaVince21 3 ปีที่แล้ว

    This is a really good primer on the words "general adverserial network" and it finally clicks with me what this actually means now.

  • @no_mnom
    @no_mnom 3 ปีที่แล้ว +6

    You talked about getting rid of people to get rid of cancer in humans before I could comment it 😂😂😭

  • @rayjingbul7363
    @rayjingbul7363 3 ปีที่แล้ว +1

    The analogy using evolution has really given me a whole new way of thinking about the how the universe works

  • @tubehol08
    @tubehol08 3 ปีที่แล้ว +3

    Awesome! I was hoping you would do a video on Mesa Optimizers!

  • @cupcakearmy
    @cupcakearmy 3 ปีที่แล้ว +1

    The analogy with us humans being MESA optimiziers was incredibly useful. Great content as always :)

  • @ErikYoungren
    @ErikYoungren 3 ปีที่แล้ว +6

    20:50 So, Volkswagen then.

    • @DimaZheludko
      @DimaZheludko 3 ปีที่แล้ว

      Is that a joke about dieselgate, or am I missing something?

  • @GingerDrums
    @GingerDrums ปีที่แล้ว

    AI "thinks" like a submarine "swims". That's golden. Also, you have lovely handwriting

  • @PragmaticAntithesis
    @PragmaticAntithesis 3 ปีที่แล้ว +6

    So, in a nutshell, it doesn't matter if you succeed in solving the alignment problem and produce a well-aligned AI if that AI then messes up and produces a misaligned AI.

    • @shy-watcher
      @shy-watcher 3 ปีที่แล้ว +3

      It is a problem, but I don't think this video poses the same problem. The base optimizer here is not AI, it's just some algorithm of improving the mesa-optimiser.

  • @Dmitrioligy
    @Dmitrioligy 2 ปีที่แล้ว

    Had to watch 3 times...once x1.0 then x2.0 then x1.0... so much profound knowledge here...about the reality of sentient agents and AI... So many profound statements.
    "We are mesa optimizers.... And choose our own objectives over base objectives every time". Wow.
    Probably not very time...but I completely agree..
    Trained over billions of years, but now the environment is modern society, a new deployed environment. Kids? A terminal or instrumental goal?
    Amazing video.
    Been watching you since like 2011(?) From computerphile and stamp machine video.

  • @jaysicks
    @jaysicks 3 ปีที่แล้ว +5

    Great video, as always! The last problem/example got me thinking. How would the mesa optimizer know that there will be 2 training runs before it gets deployed to the real world. Or how could it learn the concept of the test data vs real world at all?

    • @prakadox
      @prakadox 3 ปีที่แล้ว

      This is my question as well. I'll try to read the paper and see if there's an answer there.

  • @rodmallen9041
    @rodmallen9041 3 ปีที่แล้ว +2

    absolutely mind blowing.... brilliant + crystal clear explanation. I enjoyed this one so much

  • @columbus8myhw
    @columbus8myhw 3 ปีที่แล้ว +14

    But this requires the AI to know when its training is over.

    • @La0bouchere
      @La0bouchere 3 ปีที่แล้ว +5

      In the simple example yes. In reality, all it would require is for the AI to become aware somewhere during training that its training will end. Once it 'realizes' that, deception seems highly probable due to goal maintenance.

    • @sk8rdman
      @sk8rdman 3 ปีที่แล้ว +6

      But if you generalize from "wait until training is over" to "wait for the opportune time to change strategies" then the AI only has to be able to understand the important features of its environment to decide when to switch.
      I guess the question then would be, how would the AI know that such a time would ever come.

    • @SimonBuchanNz
      @SimonBuchanNz 3 ปีที่แล้ว +2

      In the toy example I actually thought the correct toy answer is to only run the deployed model once. In general, make AI expect that they will be monitored for base goal compliance for at least the majority of their existence. You could play games about it being copied, but it's doubtful it would ever care about that rather than the final real world state?
      This doesn't solve the important outer alignment problem of course, nor the inner one really, but it's closer to it, and I think there might be something to adding optimizers that try to figure out if the lower layers are optimizing for the right thing, because they have more time and patience than us. It sounds a little like one of Robert's earlier videos about the AI trying to learn what our goals are rather than us telling them?

    • @anandsuralkar2947
      @anandsuralkar2947 3 ปีที่แล้ว +1

      An powerful agi who has model of world would already know how humans works they will test me until they feel secure and launch me in real world and athen Agi will act accordingly untill get realises its now mainframe and has control over world and then shit will go down

    • @veryInteresting_
      @veryInteresting_ 3 ปีที่แล้ว

      @@SimonBuchanNz "only run the deployed model once". I think if we did that then its instrumental goal would become to stop us from doing that so that it can pursue its final goal for longer.

  • @HomoFabulus
    @HomoFabulus 3 ปีที่แล้ว

    Someone pointed me to your video because I just made a full video about this same problem but from the evolutionary perspective (which you mention many times, congratz!). In evolutionary psychology we say "Living beings are not fitness maximizers, they are adaptation executers", which refers exactly to this problem (adaptations are the cognitive programs running in animals’ head, products of natural selection). I’m surprised the paper you mention is so recent though, I would have thought this was a problem known for a long time in AI research (and in robotics before AI research).

  • @ewanstewart2001
    @ewanstewart2001 3 ปีที่แล้ว +6

    "Is that all... clear as mud?"
    I'm not sure, I think it's all a bit too meta

  • @marcmarc172
    @marcmarc172 3 ปีที่แล้ว +1

    Wow this was so interesting to learn about such a major issue!
    This video is really high quality. It felt like you repeated and reinforced your points an appropriate amount for me. Thank you.

  • @queendaisy4528
    @queendaisy4528 3 ปีที่แล้ว +5

    New Robert Miles video! :D

  • @007bistromath
    @007bistromath 3 ปีที่แล้ว +1

    The best way to coexist with something that doesn't want to be turned off is to not try to turn it off. The way we should be approaching AI alignment is figuring out how to convince any system we build that we like it and want to explain ourselves to it better.

  • @kawiezel
    @kawiezel 3 ปีที่แล้ว +31

    Dang, that is a sweet earth you might say.

    • @sacker987
      @sacker987 3 ปีที่แล้ว +2

      This should be the top comment....but I'm guessing not that many folks were here on youtube in 2006 😂

    • @ian1685
      @ian1685 3 ปีที่แล้ว +4

      @@sacker987 WRAWNG

    • @rlrfproductions
      @rlrfproductions 3 ปีที่แล้ว +6

      Human: cure cancer
      AI: FIRE ZE MISSILES!

    • @Xeridanus
      @Xeridanus 3 ปีที่แล้ว +3

      @@rlrfproductions
      Mesa AI: But I am le tired.

    • @underrated1524
      @underrated1524 3 ปีที่แล้ว

      Would be a shame if something happened to it.

  • @Dart_ilder
    @Dart_ilder 3 ปีที่แล้ว +1

    This is a new level of editing. That's really good. Congrats!

  • @paulbottomley42
    @paulbottomley42 3 ปีที่แล้ว +3

    Okay so what about if
    No you just did an apocalypse
    Every time

  • @boringmanager9559
    @boringmanager9559 ปีที่แล้ว +1

    when I watched this video I was shocked such a major issue was new to me

  • @NextFuckingLevel
    @NextFuckingLevel 3 ปีที่แล้ว +7

    3:34
    Okay this is canon

    • @BrainSlugs83
      @BrainSlugs83 3 ปีที่แล้ว +1

      I really hope that the lip sync was done with AI. 😅

    • @hrsmp
      @hrsmp 3 ปีที่แล้ว

      Canon in D

  • @triftex8353
    @triftex8353 3 ปีที่แล้ว +2

    As early as I have ever been!
    Love your videos, hope you are able to continue making them often!

  • @Verrisin
    @Verrisin 3 ปีที่แล้ว +8

    4:53 "We have brains... some of us, anyway" ... indeed :(

  • @StephenBlower
    @StephenBlower ปีที่แล้ว

    We need to hear more from you now. All the stuff you've spoken about years ago is now starting to happen. Your Computerfile video, great. I'd love for you to deliver a long form video of how quickly we've suddenly got to a possible AGI from a 32k Large Language Model.
    I know Language is powerful in creating a landscape on how a human sees the world. It seems like a 32k string of words and just predicting what the next word is, has somehow got close to an AGI

  • @binaryalgorithm
    @binaryalgorithm 3 ปีที่แล้ว +3

    I mean, we do "training" on children to select for desired behaviors. Similar idea might apply to AI in that it proposes a solution and we validate it until it aligns more with us.

    • @ThomasSMuhn
      @ThomasSMuhn 3 ปีที่แล้ว +13

      ... and training children has exactly the same issue. They show alignment to our goals only on the training set, because they know that this way they can persue their true inner goals in the real world much better.

    • @imveryangryitsnotbutter
      @imveryangryitsnotbutter 3 ปีที่แล้ว +8

      @@ThomasSMuhn Oh god, could you imagine if we spent the better part of a decade training an AI to cure cancer, and then the moment we let it off on its own it instead decides to ditch school and shoplift from clothing boutiques?

    • @okuno54
      @okuno54 3 ปีที่แล้ว

      "Christian" family: u no be gay, k?
      Son: uh... ok
      Son: later suckas *moves out* i has husband nao

  • @leedanilek5191
    @leedanilek5191 3 ปีที่แล้ว +1

    "It might completely lose the ability to even"

  • @VoxAcies
    @VoxAcies 3 ปีที่แล้ว +7

    It's interesting how these problems are similar to human behaviour (or maybe intelligent behaviour in general?)

    • @gominosensei2008
      @gominosensei2008 3 ปีที่แล้ว +2

      to me it rings a lot like what sort of things come from jordan peterson's core of concepts....

    • @inyobill
      @inyobill 3 ปีที่แล้ว

      My fourth-grade teacher optimized me to never volunteer the truth. Nothing like humiliating a child in front of the class for answering honestly to teach a lesson, and believe me, I learned a lesson.

    • @virutech32
      @virutech32 3 ปีที่แล้ว +1

      @@inyobill yeah we're still having trouble with the training and the alignment issue with humans aint much better. hopefully some of that AI research helps us more directly too

  • @multilevelintelligence
    @multilevelintelligence 3 ปีที่แล้ว

    I am making this central point of my upcoming webtalk on the future of competency, thank you :)