Escaping Moloch's superintelligence trap

Go Meta with Oli Sharpe

มุมมอง 2 423

134

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 7 ก.พ. 2025

ความคิดเห็น • 53

@synaesmedia ปีที่แล้ว ⁺⁴
This kind of presupposes we can make a principled distinction between "open ended self improvement" vs "improvement for my specific task".
But the whole point of "general" AIs like LLMs is that they break this distinction.
We want them to be "general" in the sense of being able to turn quickly to any task, because a) this makes them more valuable (there are more specific niches they can be applied in) but b) it probably makes them more powerful : allowing them to apply a kind of "lateral thinking" by importing analogies from outside one specific task.
The important core idea in many of the scare stories that have been coming out about people "jailbreaking" chat gpt (I'll help you with crime if we do it in rhyme) is that a general AI has the capacity for causing harm even when explicitly trained and constrained not to. And that unusual/ unforseen requests can unlock this.
So now you are left with trying to avoid closing the "self-improvement" loop. But adding any kind of "state" or lifetime learning to an AI (which we want to do for task specific reasons) is by definition giving it a kind of self improvement. Self-improvement is the basis of all machine learning.
At the next layer out, anyone who is using chatgpt to learn how to program language models, or who has set up a langchain or gpt app is already adding arcs to the autocatalytic network that may give rise to super intelligence. Yes humans are still in the loop but that's not the same as saying we can identify a single critical act which is "enable open ended self improvement". It's already enabled and going on around us right now.
@Go-Meta ปีที่แล้ว
Hi Phil, yeah, I totally recognise that this distinction is non-trivial, and it is crucial that I'm not talking about preventing a basic level of "agency" (in the sense of the ability to act autonomously in the world to achieve a goal) nor some general ability to "improve". The crucial issue here is, in effect, the combination of these: the ability to improve your high level goals and indeed the architecture by which you are working.
So, I think in very rough terms we could split different neural network AI systems into three categories:
1) Intensive training phase that sets the NN weights followed by minimal in-life learning through symbolic adjunct (e.g. Tesla self-driving NN, or even GPT-4 with a prompt).
2) Intensive training phase that sets the NN weights followed by extensive, on-going lifetime learning through adjunct (e.g. AutoGPT and possibly this new AI system that uses GPT to learn how to program agents that can operate in Minecraft - haven't yet read the paper but it's here: arxiv.org/abs/2305.16291 )
3) On-going lifetime learning of NN weights with an open ended learning architecture (similar to humans). So in this category there is no separate "learning algorithm" that is applying training "to" the NN, but the NN is just learning in an open ended way the best way to achieve some "success" criteria that it itself is evaluating (c.f. we 'train' our own brains, there is no external fitness function mechanistically applied to us)
So, I would say that type (1) definitely doesn't have any meta level learning, and so is just not going to generate a quick intelligence explosion (I'll come back to your point about autocatalytic loops). Type (3) is definitely capable of open ended learning (it's been designed to do that) and could well be as free from socially useful, instrumental goals as humans can be. AIs of type (3) are least likely to be useful tools and most capable of instigating a rapid intelligence explosion. So, as interesting as type (3) AIs would be to build, they are the most dangerous from an existential risks perspective.
Now, type (2) is the area where it is an open question to me about how risky they are. Lots of "it depends". And I know that there is already a lot of activity here. But I still think it's useful to explicitly separate out type (1) from this as type (1), even with fairly general capabilities like GPT has, is just not going to set off an intelligence explosion by themselves.
So IMHO:
Type (1) should be: full steam ahead, build the tools you want to build.
Type (2) should be closely monitored and potentially regulated. Lots of debate needed about these boundaries.
Type (3) should, for the moment, be globally banned and regulated like biological weapons. Maybe some labs get permission to do research within well "boxed" environments (not commercial products linked to the internet!!! )
As for autocatalytic networks, essentially Daniel Schmachtenberger's argument in the video I refer to (which is here: th-cam.com/video/KCSsKV5F4xc/w-d-xo.html ) was that our capitalist system could already be thought of as a misaligned superintelligence that is doing damage to society and the environment. So, yes ANY AI is just going to augment the capabilities of this existing, dangerous, autocatalytic network. And that is why we should delay also adding a unitary superintelligence into this mix until we have done a better job (if we can!) of re-aligning the existing autocatalytic network with our intended values and aspirations for the healthy society and planet.
What I most want to push back on is the idea that we are helpless in the face of this inevitable progression towards superintelligence.
@mrdeanvincent ปีที่แล้ว ⁺³
I feel your analysis of the risks is overly optimistic, but I really enjoyed this video and will definitely check out some of your others. Thanks for your work!
My current thinking is that our models are inherently much less complex than real systems and therefore leave too many holes for reality to slip through. Game theory can be a very useful tool... but, like everything else, obviously has its limitations. You touched on the caveat that agents don't always act predictably with rational self-interest... that goes very deep and can't be ignored. Potentially a single bad actor could be everyone's undoing. And even the risk of that potential bad actor can/will function as an incentive for other agents to compete along that same path.
Having said that, I'm really hoping for somebody with an optimistic outlook to sway my opinion, because currently I suspect the most likely way we might avoid runaway AI is if a major collapse of the current economic/energy system happens first.
@Go-Meta ปีที่แล้ว ⁺¹
Well, I do have a tendency to seek out an optimistic route through these complex times :-) But, I do try to have at least some grounding for this optimism, rather than just being pure hope! I've actually done a slightly longer video that outlines why I think this notion of "meta level agency" might help us make a distinction between extremely powerful AI tools compared to the kinds of AI that might have the scary levels of full agency. That video on "Meta Level Agency" is here: th-cam.com/video/3stulpAp5tI/w-d-xo.html
And I certainly share your view that our symbolic models, including these AI agents, are less complex than the world itself. However, maybe these limits apply to us just as much to our AI tools. That, to me, is the open question and with the many impressive things that GPT4 (etc) can do, the gap between our AIs and what remains as a uniquely human level of intelligence is shrinking.
And, hopefully our future options aren't restricted to either runaway AI or major economic collapse! :-)
Thanks again for your comments!
@mrdeanvincent ปีที่แล้ว ⁺¹
@@Go-Meta Well yeah there's also the risk of nuclear apocalypse that could happen literally any day. Failing that, there's any of the climate tipping points, or a large asteroid, or a supervolcano, etc... I'm sure everything will be fine! 😂
@TennesseeJed ปีที่แล้ว ⁺²
We have already sacrificed our children to Moloch.
@Go-Meta ปีที่แล้ว ⁺¹
We can't give up yet!
@TennesseeJed ปีที่แล้ว ⁺¹
@@Go-MetaThanks for your exquisite videos! I apologize for my doomerism, but I have studied several years on it and have been riding the grief Kubler-Ross rollercoaster a long time.
@Go-Meta ปีที่แล้ว ⁺¹
@@TennesseeJed Thanks! Doomerism is an all too reasonable response to the state of things .... but it can't be our only mode of being!
@kencloud1787 2 หลายเดือนก่อน
Very cool video! Loved the giant caveat of “unfortunately, only rationally self-interested people will be discouraged from building open-ended AI. Perhaps unfortunately, I think what you’re describing is something at least some people are thinking of trying to develop in the form of AI Agents. I really think that we are nowhere near ready for such Agents, so I really appreciate the sentiment that we need to figure out a way to slow this down while we figure things out. I really don’t understand the hubris of people who think they can control someone who’s way smarter than them! Anyway, thanks for the video!
@arthurwillemse8007 ปีที่แล้ว ⁺²
As you say, your argument relies on extending the explosion-metaphor. With AI’s getting more intelligent, would it be possible that they cross the rubicon to self-improvement gradually? Isn’t that actually the worry, that we’d be unsure about if and when AI’s would become more intelligent than humans?
I get that self-improvement is not a feature that any self-interest programmer is trying to build, but intelligence is or may be a nebulous concept: the issue is not one only of programming, but philosophical too.
I think the video is very helpful, and illuminating; I am keen on seeing your next contribution.
Best, Arthur
@ryansampey ปีที่แล้ว ⁺³
Would you delay tools that allow meta level self improvement in human intelligences for the same reasons?
@Go-Meta ปีที่แล้ว
Hi Ryan, interesting question! A couple of thoughts come to mind. Firstly, it could be argued that one part of what our cultures have had to come to terms with is how to handle the wide spectrum of ways that people might choose to update their high level goals, some of which are dangerous for society. Arguably that is part of the function of law and policing.
But a second thing to note is that I suspect that AIs could potentially self-improve 'themselves' in a more complete sense. What I mean is that humans cannot actually change the underlying 'learning algorithm' that we use in our wet brains, whereas AIs could change this and change the substrate on which they are running. I put the 'themselves' in quotes here because it is not clear if the identity of an AI would remain the same through too much radical change - but it's certainly the case that one generation of AIs could develop a radically improved next generation of AIs in theory, much more so than one human generation to the next (at least for all of our history so far!)
And, I think this second point is significant also because it gives all of us humans a fairly good sense of basic similarity between us and any other human, however much they may have 'upgraded' the way that they learn or the goals that they are trying to achieve. We can have a basically OK theory of mind in relation to other humans, but we may well have no idea how to comprehend the thinking of a superintelligence.
And it is this potentially extreme level of alienness that is then unnerving as we have no way to build intuitions about how a superintelligence will respond to any given situation.
So, I do think the situations are quite different from each other, but certainly a very interesting question to think about more.
@edgeman148 ปีที่แล้ว ⁺¹
It's interesting to note that there is a level of chaos going on with OpenAI over the past weekend; Sam Altman, the CEO was fired there was an uproar and now it looks like Sam Altman with go to work at Microsoft we will see what happens
@Go-Meta ปีที่แล้ว
What a weekend of chaos that was! And it certainly highlighted the fragility of the governance around some of the most advanced AI in the world. Now _that_ is already a hard alignment problem to solve well.
@diamond_s 22 วันที่ผ่านมา
The problem is a meta level superintelligent agent operating at superhuman speed is likely to outcompete and be able to outpower any more limited agent. Black op labs in governments and corporations are unlikely to resist the temptation. Any agent with humans in the loop will be orders of magnitude slower and less effective.
@michaelpolacheck3948 11 หลายเดือนก่อน ⁺¹
The opposite of Moloch could be Gaia, the living sentient world of which we are a part of and active agent. What would the inverse of Ginsberg's Moloch look like?
@Go-Meta 11 หลายเดือนก่อน
Yeah, I think the hard question is how to setup an economic system that is both successfully self-sustaining, but also, as you say, has a more Gaia than Moloch nature to its natural dynamics. But I haven't yet seen anyone suggest a convincing way to do that.
One detail to note on that, however, is that Gaia thrives on the creative interplay between cooperation and competition of evolution. Nature can also be brutal. It would be inaccurate to depict nature as a purely kind, nurturing and benevolent phenomenon. And (IMHO) the analogy between free-markets and evolution is not entirely wrong (they're both 'antifragile') - but there's a sense that we need to do better than nature at smoothing off the harsh edges of brutal competition. We kind of need to do better than Gaia.
Thanks for the comment 👍
@kdebaar ปีที่แล้ว ⁺²
An excellent summary of the wave of AI led change about to crash over us. AI is a force magnifier for humanity and I doubt we won't use it to the full extent we can despite the risks like all other such things; fire, the wheel, language, writing, internal combustion engine, electricity, computers, etc. AI's should be tightly regulated like any other potentially dangerous tools though such as cars, planes, guns, nuclear materials, etc. But what's more I think AIs with general intelligence are new legal entities themselves like 'natural' people or corporations. They should have legal rights and responsibilities and legally should not be free to do societial level damage. Legal changes are usually a generation behind and initially flawed though. I believe intelligence based on neutral networking probably has an upper bound no matter the substrate or data sources but it will be many more times than us individual meat computers. It is corporations drive for profit in the short term and political election cycles that are the biggest flaw in the first world and many problems we face, including climate collapse, a worse pandemic and a potential AI led technological singularity that slip between individual countries legal control. Interesting times ahead for us all. I look forward to more from your channel.
@Go-Meta ปีที่แล้ว ⁺²
Hi Kerry, I agree that the profit motive and political dynamics are the biggest threats to society at the moment. My naïve, optimistic hope is that social media, like TH-cam can be a place where through discussions like this, we can build global solidarity to find ways to coordinate our route out of these corporate and geopolitical Molochian traps.
Thanks for the comment!
@Pgr-pt5ep ปีที่แล้ว
In highschool I remember parents in competitive circles telling other parents their child was only doing alright or good, but in actuality was performing much better. That mentality is built-in and we can assume that behavior to be the default.
In the 1000s of AI researchers/creators, everyone has to agree to limit themselves AND give confidence to all others that they are limiting themselves. All it takes is one low confidence individual to break the agreement for all. So my conclusion: Full speed ahead and pray that giant asteroid strikes first.
@Go-Meta ปีที่แล้ว ⁺²
Yeah, there are some who argue that "good" people should race full speed ahead to build superintelligence before "bad" people do so that the power is in the hands of "good" people. But, it's not like nuclear weapons or some other tool before where being the first to "have" the tool means you definitely get to control it. This time, it's more like being the first to invite the aliens to land on your continent hoping that by doing so we'll be able to influence the aliens more than anyone else. Maybe. Seems like a huge risk to me.
Maybe we would have better odds with an asteroid 😂
@stvn0378 ปีที่แล้ว ⁺¹
game theory... terms like this have always creeped me out
could be some positive research/applications behind it too though
@xenuburger7924 ปีที่แล้ว ⁺¹
The rush to build new advanced chip fabs is going to create excess capacity that can only be absorbed by cheaper AI hardware. Forget any kind of brakes on this thing.
@DuberlyMazuelosBZero ปีที่แล้ว ⁺¹
who is moloch?
@Pgr-pt5ep ปีที่แล้ว ⁺¹
Pretty soon, our daddy.
@DorksterJr ปีที่แล้ว
@@Pgr-pt5ep What makes you think it isn't here yet, just in a more abstract form? You really think populations are not being sacrificed to meet some goal? You think the educational system is designed to maximize your potential as human rather than shape you into something that will slot into the system?
@Pgr-pt5ep ปีที่แล้ว
@@DorksterJr You're just stating the obvious. Yes we've always been under some sort of 'Daddy', be it the government, religious order, tribe rules, or someone else. We know the extremes are anarchy on one end, and complete subservience like in the Matrix on the other end. All of history with political strife upto now has been about avoiding either extremes.
But Moloch represents the Matrix level of control, perhaps without any redpill/bluepill opportunities. Humanity has never been there.
@randyquinn922 ปีที่แล้ว
Why is no one considering hard coding Asimov’s Laws into the AI? Also-consider the Governance of a Board of AIs and humans who determine which up and coming AIs get to upgrade themselves-as posited in William Hertlings Avogodro series? I think this happens in Book 3
@Go-Meta ปีที่แล้ว ⁺¹
If I remember correctly, the collection of short stories by Asimov, "I, Robot" was a lot about the ways in which these three laws led to paradoxes and problems if you try to interpret them too literally and without the kind of common sense, moral reasoning that humans often do in edge cases. And, indeed, in Nick Bostrom's book, Superintelligence (that I refer to in this video: th-cam.com/video/VvVVO3SZn4I/w-d-xo.html) he does mention Asimov's three laws as an example of how you could attempt to directly implement values alignment for an AI. But, he there also argues that it'd be much better to try to implement some kind of indirect values alignment as language based, strict rules will always have interpretation errors and edge cases where they fail.
As for William Hertling's Avogodro series, I haven't read any of them yet, maybe one day. But I know that there are many people who seem to be just as worried about global governance as they are about the dangers of superintelligence. Personally, I think we do need some level of globally agreed regulation of the most dangerous forms of AI. But it's a non-trivial subject! :-)
Thanks for the comment.
@mrdeanvincent ปีที่แล้ว
The two poles in most of these discussions seem to be 1) the threat of some largely unknown catastrophe vs 2) the oppressive control measures intended to prevent that. How do we strike a balance? In short, we almost certainly won't. So... is there another way to look at this? That's what I'm currently exploring and I would love to hear anyone's insights or suggestions.
@mrdeanvincent ปีที่แล้ว
But Asimov's laws are fanciful as a solution. As @Go-Meta commented above, Asimov's own work explored the fallibility of the laws. And our current AIs are already easily jailbroken to evade the bounds imposed by the creators.
@rolestream 10 หลายเดือนก่อน ⁺¹
Those damned "bad actors".
@Go-Meta 10 หลายเดือนก่อน
😂
@JWPanimation ปีที่แล้ว
Yes sounds good but you have to consider weaponized AI in the service of nation states. Even if AI for the general public is regulated, the military applications will contune covertly, as they do now with aerospace tech. It's particulary destabilizing due to the exponential nature of the technological progression.
@Go-Meta ปีที่แล้ว ⁺¹
Hi John, yeah, unfortunately I too think it's very likely that AI will be used extensively by militaries, but I expect that, just like companies, militaries will be extremely keen to keep their AIs under strict control performing specific tasks. So, they will not want to create AIs that can undergo an intelligence explosion that means the AI stops following orders. It's just too risky.
So, yeah, that's little comfort for those who might be targeted by such AI weapons, but it does reduce the existential concerns around AIs dominating humans, even if it still leaves the massive risk of humans wiping ourselves out with nuclear weapons or whatever.
That's why, separate from AI risks, we humans need to try to remove the Molochian style multipolar traps that lead nations to fight nations. War is such a terrible outcome, for so many reasons, that we should find ways to radically reduce the risks of war breaking out.
@JWPanimation ปีที่แล้ว ⁺²
@@Go-Meta Nuclear weapons proliferation and arms control might be one best way to apprpach it. But I think it's more akin to a bio weapon that once it gets out, there might be no stopping it, akin to a virus that can mutate irself to evade immunity. You raise good points and it really needs to be a part of our conversation.
@Go-Meta ปีที่แล้ว ⁺¹
Yup, agreed re: mutating bioweapons.
And thanks 👍
@marcbrasse747 ปีที่แล้ว ⁺¹
If one keeps pointing out the supposed bad societal, political and economic changes that would / might be unavoidable by the appearance of real AI one also has to quantify these effects. Otherwise this is little more then a poor fear of change argument. Is our world a benign and sustainable place as it is? I KNOW it is not! History shows that we keep repeating the same sort of senseless Darwinist feudal / capitalist experiments time and time again without changing the outcomes. This is why all civilizations / empires fall. Which proves that our own programming is defective. Therefore we should stop talking about what we do not want and start doing what needs to be done. Otherwise AI will (have to) do it for us.
@Go-Meta ปีที่แล้ว ⁺¹
Hi Marc, yeah I agree with this sentiment a lot: we need to rapidly get beyond just identifying what we don't want and start creating the society that we do want. I think a key problem is that we don't yet have a really convincing picture of how to build a plausible, functioning, sustainable and fair society. I suspect that to do that we need to get our politics out of the 20th century mode of thinking and into the 21st century. But that's easier said than done. I'm hoping to explore some ideas around this on my channel one day! :-)
@salmiak-salmiak 2 หลายเดือนก่อน
my only societal goal is the end to bad facial hair..... starting with this individual Oli Sharpe. please align with us Oli and shave your beard
@camazotzz ปีที่แล้ว
There are a lot of problems with AI, but the idea that those threats are going to come from a self motivated AI is just silly at best. The threat is of how humans and our institutions will use the technology to hurt ourselves and others as we have done countless times before. Our monkey brain would so love a simple monster that we can fear but the threat isn''t some terminator or matrix style baddie, its the emergent stupidity of humankind.
@Ohnoitsbuggerednow ปีที่แล้ว ⁺²
Pets , nothing more. Us I mean !
@MichaelK.-xl2qk 3 หลายเดือนก่อน
As we watch religious fanatics destroy peace for narrow sectarian aims every day, the hope that rationality will prevail in regards to AI development and regulation seems unfounded. However, if you could deliver reliable hardware solutions to this kind of activity in an AI, if it should start somehow, I think you would find a very receptive market, and hence perhaps in this small way the confluence of individual and group interests will help avoid the worst outcomes, at least for a time...
@MiguelMejia-ru4lm ปีที่แล้ว
No because its just one path guilty.. so innocent equals 0 years
@Go-Meta ปีที่แล้ว
True, there is no column for 'innocent' :-)
@davidlast4682 4 หลายเดือนก่อน
First World Problem.

ต่อไป

เล่นอัตโนมัติ

Which jobs will AI replace? - Extending an AGI scenarios analysis from the IMF