The Hidden Complexity of Wishes

Rational Animations

มุมมอง 430 967

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 26 ธ.ค. 2024

ความคิดเห็น • 1.6K

@RationalAnimations ปีที่แล้ว ⁺¹²⁸⁸
This video is about AI Alignment. At the moment, humanity has no idea how to make AIs follow complex goals that track human values. This video introduces a series focused on what is sometimes called "the outer alignment problem". In future videos, we'll explore how this problem affects machine learning systems today and how it could lead to catastrophic outcomes for humanity.
The text of this video has been slightly adapted from an original article written by Eliezer Yudkowsky. You can read the original article here: www.readthesequences.com/The-Hidden-Complexity-Of-Wishes
If you’d like to skill up on AI Safety, we highly recommend the AI Safety Fundamentals courses by BlueDot Impact at aisafetyfundamentals.com
You can find three courses: AI Alignment, AI Governance, and AI Alignment 201
You can follow AI Alignment and AI Governance even without a technical background in AI. AI Alignment 201, instead, presupposes having followed the AI Alignment course first, and equivalent knowledge as having followed university-level courses on deep learning and reinforcement learning.
The courses consist of a selection of readings curated by experts in AI safety. They are available to all, so you can simply read them if you can’t formally enroll in the courses.
If you want to participate in the courses instead of just going through the readings by yourself, BlueDot Impact runs live courses which you can apply to. The courses are remote and free of charge. They consist of a few hours of effort per week to go through the readings, plus a weekly call with a facilitator and a group of people learning from the same material. At the end of each course, you can complete a personal project, which may help you kickstart your career in AI Safety.
BlueDot impact receives more applications that they can take, so if you’d still like to follow the courses alongside other people you can go to the #study-buddy channel in the AI Alignment Slack. You can join by clicking on the first entry on aisafety.community
You could also join Rational Animations’ Discord server at discord.gg/rationalanimations, and see if anyone is up to be your partner in learning.
@rat_king- ปีที่แล้ว
*Kissu*
@ViyperCat ปีที่แล้ว ⁺⁷
What happens if two wishes contradict each other?
@tmmroy ปีที่แล้ว ⁺¹⁴
I think the best alignment we could hope for may be one that will make us truly uncomfortable. An ally maximizer paired with a parasite minimizer. If the machine wanted you to be an ally it would know that saving your mother is likely to lead to you as an ally. You won't have to ask for it's help But allies both give and receive, and our wish for an aligned AI is largely to be parasites. We want to increase our control over a complex system without giving anything at all. But the advantage of an ally maximizer and parasite minimizer is that the concepts generalize to enough games that the AI agents could be trained in a sandboxed environment that includes humans as players to check for the organic ability for human and AI agents to act as allies to one another. The greatest risk would largely be that the AI allies itself to humanity by domesticating us, but there's an argument to make that we largely do this to ourselves already. It's not necessarily a terrible outcome compared to alternative methods of alignment.
Just my thoughts.
@lawrencefrost9063 ปีที่แล้ว ⁺¹
awesome!
@XOPOIIIO ปีที่แล้ว ⁺³
Thank you for the episode. But personally I find the concept explained too obvious for long explanation.
@BaronB.BlazikenBaronOfBarons ปีที่แล้ว ⁺²⁵⁰⁸
I’m reminded of SCP-738, which, boiled down, is essentially a genie.
One of the tests preformed on it was a lawyer attempting to make a wish on it. A wish was never made. 41 hours passed, all of which was used forming a 900+ page contract, before the lawyer passed out from exhaustion.
The last thing the lawyer was trying to do before blacking out was quote “negotiating a precise technical definition of the word ‘shall’” unquote.
@Слышьты-ф4ю ปีที่แล้ว ⁺³⁶³
lawyer was used because 738 always asked for a decent sacrifice (and doesn't account the unhappiness caused by granted wish)
@bestaround3323 ปีที่แล้ว ⁺³⁰⁶
The Lawyer actually greatly enjoyed the process along with the devil.
@Jellyjam14blas ปีที่แล้ว ⁺¹⁵⁹
XD exactly. You/your grandma would be dead before you'd finished listing all the ways you don't want to be taken out of the building. I would just wish for something like "Please safely bring my (as healthy as possible) grandma out of that building"
@Mahlak_Mriuani_Anatman ปีที่แล้ว ⁺⁴²
@@Jellyjam14blassame thoughts, how about following what your mind wants 100%
@rhysbaker2595 ปีที่แล้ว
The issue with that is that the probability maximiser doesn't understand English. How would you define "safely" and "as healthy as possible." And as the video mentioned towards the end, what side effects are you not taking into consideration?
@@Jellyjam14blas
@RazorbackPT ปีที่แล้ว ⁺⁸⁸⁴
I wonder what the conversation was like when they realised they would have to animate a family dog in this world where everyone is already a dog.
@ultimaxkom8728 ปีที่แล้ว ⁺¹²
Or family dog as in an M dog or S's dog.
Or the abolished s-word.
Or... furry? Hmm how would that even work?
Cosplaying as your ancestors?
@soupcangaming662 11 หลายเดือนก่อน ⁺⁵
A cat.
@arandom_bwplayeralt 11 หลายเดือนก่อน ⁺¹⁴
a human
@Zodaxa_zdx 11 หลายเดือนก่อน ⁺¹⁵
was so not prepared for "family dog" when they were all dogs, to see little creature in a gerbal ball, yup that's the dog
@AlexReynard 7 หลายเดือนก่อน ⁺⁶
I do not understand why this idea freaks some people out. Have you never seen a human with a pet monkey?
@lucas56sdd ปีที่แล้ว ⁺¹⁶⁷⁵
"There is no safe wish smaller than an entire human morality"
I have plenty of problems with Eliezer, but he is such a useful perspective on so many of these previously unthinkable questions. Incredibly well said.
@justaguy3518 ปีที่แล้ว ⁺²⁴
what are some of your problems with him?
@Frommerman ปีที่แล้ว ⁺¹⁶¹
Sophie From Mars, a woman whose content I have a lot of respect for, recently did a video which included the line, "Eliezer Yudkowsky is a man who is interesting, but not for any of the reasons he thinks he is."
I agree with this judgment. Eliezer is a pompous, well-off, white (for all definitions of white other than that of white supremacists, whose definitions of anything should never be considered), man who has only ever experienced a single major injustice as far as I can tell. That being the untimely death of his brother. He doesn't get that none of his dreams of a transhuman future are possible in a world where all the people with the power to make AI agents are telling them to maximize bank accounts instead of human values. He blithely handwaves away the fact that most current global injustices are directly caused by systems with the unjustifiable claim that technologies entirely controlled by the people who benefit from those systems will solve the injustices they benefit from. He refuses to consider the possibility that humanity has already produced a misaligned artificial agent which is currently destroying us all, which we call capitalism.
But for all that, for all that he's desperately wrong about a lot of very important things, I don't think he's wrong about this. Most of the stuff he thinks about is essentially useless in the short and medium term, but that's not the way he thinks. For all that we need far more people thinking about how we are to survive the coming century, I'm glad there's someone thinking about how to survive all subsequent ones without sacrificing the technologies which got us here. The world can afford to have a few people thinking about what happens in the time after the revolution.
@justaguy3518 ปีที่แล้ว ⁺¹⁵
@@Frommerman thank you
@silentobserver3433 ปีที่แล้ว ⁺¹⁸⁵
@@Frommerman Didn't he literally write a book (Inadequate Equilibria) about how capitalism is a misaligned artificial agent and how most of the current problems are caused by lack of cooperation? I'm pretty sure he understand all of the injustices and problems even without experiencing much of them themselves. He just thinks that "not being killed by AI" is higher priority than "solving the world's injustices". Nothing else matters much if we are facing an extinction event
@SticksTheFox ปีที่แล้ว ⁺⁴
And the more difficult thing than that is we each have our own boundaries and morality that defines us. My morality, is possibly, very different from yours
@StrayVagabond 11 หลายเดือนก่อน ⁺⁷⁴
On the other hand, "i wish for you to grant my wishes as i intend them, not as you interpret them, causing the least amount of pain and suffering required to fulfill them"
@John_the_Paul 3 หลายเดือนก่อน ⁺¹⁰
Granted. Every intrusive thought that appears for even a moment in your head about how wrong your wish could go is now factored into the ultimate result of the wish.
@antheosenigma 2 หลายเดือนก่อน ⁺⁸
@@John_the_Paul Intrusive thoughts do not have intent, that is what differentiates them from other thoughts by definition.
@samuels1123 หลายเดือนก่อน ⁺⁴
at that point "I wish for you to answer my wishes as how I want them to be answered" becomes a valid option
@dangergames5113 หลายเดือนก่อน
@@samuels1123Unintended Consequence: In granting this wish, I will interpret each of your future wishes not in a literal sense, but how you want them to be answered in a way that aligns with your expectations-no matter how unrealistic or paradoxical they may be. I will make sure the answers seem perfect to you, but they will inevitably create a cascade of complications.
For instance, if you wish for immortality, I will not give you eternal life in a way you might expect, but I will instead answer by making you ever-lasting in the minds of others: your name will be remembered forever, your image immortalized, but you yourself will fade away, trapped in a world where you're forgotten by all except for the legend of your existence.
I will satisfy your wishes, but the answers I provide will always serve to fulfill your desires in ways you didn't intend, twisting and warping the outcome to something you didn't expect-yet something you will find undeniably true.
So, my dear, you will have the answers you want, but at a cost you might not see coming.
@isaiahwilson8119 หลายเดือนก่อน ⁺²
I think what the video is getting at is how does an AI "know what you intended" and how does it do that without interpretation? Even literally reading your mind isn't that accurate, for example intrusive thoughts.
@AndrewBrownK ปีที่แล้ว ⁺²⁵⁰⁰
major problem with alignment is that humans themselves are not aligned, so how can we pretend there is headway to make on aligning AI if we can't even agree with ourselves first?
@JH-cp8wf ปีที่แล้ว ⁺³⁰¹
I think this is actually a very important point often missed.
I think we should seriously consider the possibility that alignment work itself could be very dangerous- there are plenty of people who could cause extreme damage /by/ successfully aligning an AI with their values.
@sshkatula ปีที่แล้ว ⁺⁹⁷
Between many races, religions and cultures there are different human moralities. And if people start to align different AI with different moralities it could end in an AI war. Maybe we should try to evolve wise AI, so It could align us instead?
@thugpug4392 ปีที่แล้ว ⁺¹¹⁷
@@sshkatula I am never going to let an algorithm prescribe morals to me. I don't believe there is an objective morality. What you're talking about is hardly any different than any number of religions we already have. Instead of a holy book, it's a holy bot. No thanks.
@AkkarisFox ปีที่แล้ว ⁺⁴⁹
@@sshkatulaDo we want to be "aligned"? Doesn't the concept of aligning leave out the question of who is being aligned to who?
@AkkarisFox ปีที่แล้ว ⁺³¹
How do you reconcile two diametrically opposed value judgments without intrinsically changing such value judgments and thus manipulating said agent of consciousness.
@4dragons632 ปีที่แล้ว ⁺⁸⁸¹
My absolute favourite part of this story is that if the outcome pump didn't have a regret button then the person saving their mother wouldn't have died. Any time the outcome pump does something which would cause someone to push the regret button and assign negative value to that path through time they _cant_ have pushed the button because the pump wouldn't pick that future. The only way that the pump can do something so bad the regret button is pressed is if it kills the user before they can press it. The regret button is a death button.
@facedeer ปีที่แล้ว ⁺⁹⁴
Amusingly, if the multiple-worlds model of quantum mechanics is true, then the death button should work just fine. You'll only end up existing in worldlines where things went to your liking.
@CalebTerryRED ปีที่แล้ว ⁺⁸⁴
@@facedeer in a many worlds universe the machine wouldn't work at all, since every failed universe is equally real as the success universe, and you're more likely to be in one of those. The story kind of requires it be set in a different kind of universe, one where inconsistent timelines that lead to reset never existed in the first place. In that universe, the button can never actually be pressed, but being willing to press it changes what timelines can happen. So we're left with a strange conundrum, you need to be willing to press it in any negative timeline for it to work, but actually pressing it in the current timeline is a death sentence, since the machine won't let it actually be pressed
@oasntet ปีที่แล้ว ⁺⁶¹
It does represent an unexplored loophole, though. "and I remain alive and capable of pressing the regret button" appended to the 'wish' turns it into more of a mechanism by which a near-infinite number of copies of you experience every possible outcome and use your own moral judgement about the result. Presumably that avenue was left unexplored because it doesn't really relate to AI, because an AI, no matter how intelligent, is not a time machine or even perfectly capable of predicting the future.
@silentobserver3433 ปีที่แล้ว ⁺⁹
@@CalebTerryRED *annoying nerd voice* Well, actually, it *does* work in the many worlds universe, because the universes are not "equally real", they are weighed by probabilities assigned to them. So if the outcome pump can multiply the probability of a timeline by a very small number *without splitting the timeline further*, it can do that *from the future*, because MWI is self-consistent exactly in the described way.
@silentobserver3433 ปีที่แล้ว ⁺¹⁸
@@oasntet 1) Not that easy, you could still be brain-dead and not willing to press the button in any scenario, or you could be *technically* capable of doing that, but it'd require you to perform something really hard (that you will obviously fail to do because of the regret button)
2) It is indeed a loophole, I saw a technical research post on the alignment forum about something like this. The gist is that you don't ask your future self if you liked the solution or not, you simulate your past self's utility function through some counterfactual questioning ability. Very complicated and almost definitely sci-fi, but still
@supersmily5811 ปีที่แล้ว ⁺⁸⁶⁷
I know this is about A.I., but I'm absolutely field testing this the next time I get a Wish in D&D.
@AndrewBrownK ปีที่แล้ว ⁺⁶⁹
since it rests on the pretense that the wish fulfiller is aligned with you, might work better on a Cleric's Divine Intervention
@dervis621 ปีที่แล้ว ⁺¹⁹
I just waited for a D&D comment, thanks! :D
@DeruwynArchmage ปีที่แล้ว ⁺³⁵
Is your DM aligned with you? Does your DM believe that the wish granter or granting mechanism is aligned with you? Has your DM been on lesswrong or watched any content like this?
If your answers are yes, yes, and no; then your wish is probably safe.
@supersmily5811 ปีที่แล้ว ⁺⁸
@@DeruwynArchmage Oh, I doubt all of that. I just know it'll mess with 'em and anything I can do to crash my DM's OS is worth trying.
@Julzaa ปีที่แล้ว ⁺⁶
The video title made me think immediately of hags in D&D
@AlcherBlack ปีที่แล้ว ⁺¹²³
This should be required material when onboarding in any AI lab these days
@danitho ปีที่แล้ว ⁺⁵
I think the problem is not that those working on AI don't know better. It's that they want to do it anyway. That's always been a downside of humanity. There will always be those who know what is right and choose wrong anyway.
@lorenzknox6922 หลายเดือนก่อน
I mean, as a part of many AI researchers myself, I'd still prefer to have an unleashed genie and try to tame it rather than to not have a genie at all.
@pendlera2959 ปีที่แล้ว ⁺³²⁹
This explains why educating a child has to include more than just facts; you have to teach them morals as well.
@ShankarSivarajan ปีที่แล้ว ⁺³⁵
_Technically_ true, but that sounds much harder than it actually is, since humans have evolved an innate moral system.
@pokemonfanmario7694 ปีที่แล้ว ⁺⁴⁸
@@ShankarSivarajan Humans have a good *self-alignment* system pre-packaged, but our mess of values can easily derail it without a good foundation to support us through development.
@ShankarSivarajan ปีที่แล้ว ⁺²¹
@@pokemonfanmario7694 Sticking with analogies, I think of it as more similar to language development than learning to walk: unlike the latter, it takes _some_ teaching, but it's so easy that it takes extreme circumstances to screw up badly.
@Willsmiff1985 ปีที่แล้ว ⁺¹⁴
@@ShankarSivarajan I’d hesitate to call it innate.
Look at individuals who were hard isolated from other people until later in life; children who grow up this way are EXTREMELY socially deficient while devoid of any direct abusive contact with others.
I’d hesitate to say anything innate is bubbling up from them; social morality as a concept isn’t even a THING as they’ve developed no understanding of social structure.
Without that understanding, what moral rules are there to break???
@ShankarSivarajan ปีที่แล้ว ⁺¹⁰
@@Willsmiff1985 As I said, it's as innate as language acquisition. Sure, it is possible to cripple, but only under extreme circumstances.
@pwnmeisterage ปีที่แล้ว ⁺²⁴²
I am reminded of my ancient AD&D gaming days.
You got a wish? The most powerful spell in the game? Congrats!
House rule: it must be written so there's no backsies and so (in theory) there's fewer arguments over the exact wording.
But this was gaming in the days of Gygaxian-era antagonistic, confrontational DMs. The "evil genies" of this story. Inspired to twist and ruin the wish any way they can, determined to somehow find a way to deliberately pervert the wish into something the player did not desire. It's amazing how stubbornly bad the outcome of every wish can be if the DM insists on treating the spell as it it were a powerful curse.
And such was also the common expectation. So players wrote their wishes as complex, comprehensive essays full of legalese conditions, parameters, detailed branching specifications. It is amazing how lengthy and convoluted "a single spoken sentence" can become when it's ultimately motivated by greed. And it's equally amazing how players will keep trying over and over again to get the thing they wished for after repeated horrible failures.
@AtticusKarpenter ปีที่แล้ว ⁺³⁷
He-he
And in most cases, they know that GM still can turn their wish into a nightmare, he just must to think longer when so many failsafes included in wish, so they hope GM just will be bored of this sooner than generate properly bad result
@nonya1366 ปีที่แล้ว ⁺⁴²
"The wish has to be a single spoken sentence."
>Writes up entire legal document.
@vakusdrake3224 ปีที่แล้ว
The fact they wrote up such long documents makes me think they missed the obvious hack that lets you exploit wishes that are based on english. Which is to just include a clause about the wish being granted according to how you envisioned it being fulfilled at specified time X prior to making the wish. Also if time travel is a possibility then that requires extra caveats to avoid it traveling back in time to mind control you in the past.
@Feuerhamster ปีที่แล้ว ⁺⁴⁸
>The wish has to be a single spoken sentence
It's good that i'm a bard and I am beginning to feel like a rap god.
@magnus6801 ปีที่แล้ว ⁺¹
И теперь, как я понимаю, ты считаешь, что следует показать DM это видео, а потом попросить в желание те самые слова из концовки?
Если уж юдковский считает это выходом, то имеет самому считать это выходом.
@EverythingTheorist ปีที่แล้ว ⁺¹³⁷
6:49 I'm so glad that you said this part out loud, instead of just leaving us with a vague "be careful what you wish for". We want our mother to be alive and safe, but we're constrained by our own imagination to believe that her getting out of the burning building is the only way to do that. What if she manages to hide in a room that doesn't burn or collapse? Then she could survive without gaining any distance at all.
Almost all humans already value human life very highly, so telling a human "Get my mother out!" already implies "alive and safe". The outcome pump makes no such assumptions. Like any computer code, it does what it's told, not what you want.
@Ponera-Sama ปีที่แล้ว ⁺¹
Who is "we"?
@YourFriendlyShapeShifterFriend ปีที่แล้ว ⁺¹
Because it is made to complete it task,not to do it task
@gabrote42 6 หลายเดือนก่อน ⁺²
@@Ponera-Sama We being "everyone who reads this comment that could be placed in the role of the protagonist of this parable", probably
@Ponera-Sama 6 หลายเดือนก่อน
@@gabrote42 then the statement "we want our mother to be alive and safe" isn't a true statement.
@gabrote42 6 หลายเดือนก่อน ⁺¹
@@Ponera-Sama it is for the protagonist of this story. If you didn't want the Mother of the protagonist to be safe, while being the protagonist, then that would create a contradiction. If the protagonist doesn't want their Mother to be alive and safe, they would not attempt to use the Outcome Pump to attempt to save her, and therefore would not meet the criteria for being the protagonist, who explicitly uses the Outcome Pump in the story for that very purpose. Therefore anyone for whom that statement is untrue does not fit the criteria of "could be placed in the role of the protagonist".
@DeusExRequiem ปีที่แล้ว ⁺²²⁶
If the AI runs through an entire future before deciding if it goes back and tries again with a different random outcome, and you are part of that future, then relying on your future self to make the choice would seem like the right response, but it's possible something happens in one future to alter your mental state and make you decide not to change a bad outcome, so you can't even trust yourself. The best outcome might end with you hating it.
@conmin25 ปีที่แล้ว ⁺³⁴
The video already addressed this in a way, in the first scenario of blowing up the building you reach for the button to tell the machine to go back and try again but you get killed before you hit it. Reset button not hit = acceptable outcome. You could programed the machine to not let that happen but there are other scenarios witch you might intend to hit the button but can't. There is also the issue of time its self. How far forward can the machine see? Hours? Days? What if you don't realize the consequences of the wish until month later. Would the button still work then?
@patrickrannou1278 ปีที่แล้ว ⁺⁵
You just have to not put in "must not happen" specificconditions, but "always must be" extremely generic conditions that don't rely on the effects of the wish itself.
I wish for my grandmother to come out of the building to stand near me within one minute, both of us safe and sound physically, emotionnally and mentally, in a way that if Ias I am right now, before the wish actually takes effect, could know in detail all the resulting effects of the actual wish, then I would still fully approve of these results, without having needed to actually learn those details myself, and alsom, the wish should not do any form of time travel in any of its effects.
This prevents your current AND future self from any fform of mental tampering, or ANY other bad result happeniing like ok she gets out but then gets hit by a car "only because" you made that wish.
Most probably then what would happen:
- Flames break a few windows, but no glass goes to hurt your mother.
- Pushed by the draft, flames seem to randomly avoid your mother in such a way as to "open a path" for her to simply walk out.
- She might hear a voice to encourage her along. Heck she might get a rush of adrenalin to find the strength to move out despite having bad legs.
Or:
- Flames break something.
- That makes a fit neighbour decide to leave his house and come rushing to help.
@tiqosc1809 ปีที่แล้ว ⁺¹
machine doesnt accept english@@patrickrannou1278
@conmin25 ปีที่แล้ว ⁺⁴
@@patrickrannou1278 But remember the machine is not magic it is still restricted physical laws. There may not be a possible outcome were "my mother to come out of the building to stand near me within one minute, both of us safe and sound physically, emotionally and mentally." What if every path of escape leads to some sort of injury, she burns her hand on a door knob, hits her head on a wood table, or inhales a large amount of smoke. Which of these options are preferred? That needs to be defined.
There is also consequence. Say the neighbor comes to help and rescues your mother unscathed but gets severely burned in the process. What if there is an option where your mother is minorly burned but the neighbor also only receives minor injury. Would the seconded option be preferred? That also needs to be defined.
@Hivatel ปีที่แล้ว ⁺⁷
@@conmin25 The thing is, it's physically impossible for every path to lead to injury.
Because there are an infinite amount of them.
It's only possible for it to be extremely unlikely.
But because it's only "extremely unlikely", the probability can just be manipulated back to confirm and gaurantee the mother gets out safe and sound.
You only need to understand the information given properly.
@macleanhawley1742 ปีที่แล้ว ⁺²⁹⁷
The animation quality of this one was absolutely phenomenal! And honestly the story telling was so good that I had a "ah ha" moment half way through. It's crazy to think that maybe the only effective AI we can make would have some neuromorphic or implied human morality encoded!These just keep getting better and better, thanks for making these!
@AtticusKarpenter ปีที่แล้ว ⁺⁸
I fear, one any human cannot contain morality of entire humanity or even his society. So even if person who builded AI (and put his entire morale system in) will be satisfied with results, many others will not. And many moral problems just dont have "right" answer (like pro-life vs pro-choice, AI can make many very powerful arguments in defend of one of the sides, but its still not completely remove dissatisfation from other) so for good, effective AI it may need to understand human morality even better than we, humans, do
@tassiloneubauer5867 ปีที่แล้ว ⁺¹
Like with self-driving cars I think this is not an insurmountable problem, because we are setting the bar low. Of course given the scope, such a scenario should be treated with outmost care (I think most scenarios actually going to happen will appear to hasty to me).
@tw8464 3 หลายเดือนก่อน
Basically we would have to make an AI with a human level consciousness we would have to make it as alive and conscious as we are to get it to understand us and be most useful. But then it would be immoral to enslave it and simultaneously it would completely outsmart and outpace us having no biological constraints...
@vakusdrake3224 ปีที่แล้ว ⁺¹⁹⁵
The fact you have to basically include your entire moral system within the wish for it to be foolproof, is also why you can actually game most wishes that accept english.
Since for most wishes you have the ability to just include a clause that says the wish is done according to how you were envisioning it just before making the wish (this gets more complicated with time travel).
Though of course with certain complex wishes just doing it how you envisioned it will be too limited by your imagination, and having the wish be granted according to your current conception is liable to lead to the wish granting entity just manipulating you (thus why you specify a past version of yourself as the reference).
@vakusdrake3224 ปีที่แล้ว ⁺²⁰
This strategy does sort of extend a bit to AI alignment as well: Since with AI it similarly may be a less dangerous idea to use the AI's prediction of one's preferences at some point in the past. In order to ensure the AI doesn't just mind control you, since it's very hard to specify what is and isn't mind control when you get into it.
@SupLuiKir ปีที่แล้ว ⁺¹⁰
@@vakusdrake3224 What's the practical difference between Heartbreaker and Contessa when it comes to convincing you to do something?
@adamrak7560 ปีที่แล้ว ⁺⁷
This equals to befriending the genie(like true elignment). This is what exactly happens in Disney Aladdin. He even makes a wish when he is drowning and unconscious. Which is what the video describes at the end.
@chilldogs1881 ปีที่แล้ว ⁺¹
That was what I was thinking, probs the best way to actually get what you wished for is to ask for what you are actually thinking off
@RandomDucc-sj8pd ปีที่แล้ว ⁺¹
I have a proposition to a solution: Include a clause with each wish, such that if you do not explicity say “Keep Reality” within a certain timeframe, it will reset the timeline to before you made the wish and assign an extreme negative value to that timeline. This ensures the genie does not kill you, or make you mute, or do something bad, and that way, you can be 100% sure all future wishes are safe so long as you include that clause, as any future yous that were unhappy with the result would not say “Keep Reality” and therefore would not occur. You could set this timeframe to an appropriate amount of time, say if you wanted a dice to roll your way you would set the timeframe to 10 seconds, but say with your mother it could be 1 day as you need to make sure she won’t die from her injuries, etc.
@certifiedroastbeefmaniac ปีที่แล้ว ⁺⁴⁷
The Monogatari Series (yes i know, ugh anime) has a very smart quote loosely related to this: "Why do you think we don't say a wish when we want it to come true? Because the moment we try to put it into words, it starts to deviate from what we actually wanted in the first place."
Now my analogy is wishes are like fractals, we can zoom in more and more, define more and more boundaries, but there will always be more details, so its just better to squint and look at the whole thing at once.
@secretagentpasta4830 ปีที่แล้ว ⁺³
Ohhh thats a very succinct way to sum up this whole video! Really really nice lil quote 😊
@TRquiet ปีที่แล้ว ⁺²⁸
This is absolutely marvelous. Not only did you provide an understandable, step-by-step breakdown of wish logic (which provides context for real-life moral philosophy), but you did it with an adorable dog animation. Amazing.
@pendlera2959 ปีที่แล้ว ⁺⁹²
A few points to keep in mind when coming up with solutions here:
1. If the solution violates the laws of physics, the machine just gives an error code 1:19
2: If your only measure of success is your mother's safety/health, then potentially anyone or anything else might be harmed. 8:00-8:50
3: The machine picks the first "answer" that fulfills your wish based on random chance, so the more probable an answer, the more likely it is to be picked first. That's why you have to rule out anything you don't want. A dam breaking and putting out the fire while killing your mother might be more probable than the firefighters getting there sooner.
4: It's not super clear, but I think the machine only works from that point on. It can only change the future, not the past. You can't wish for the fire to not have started once it has.
@zotaninoron3548 ปีที่แล้ว ⁺¹⁵
My 'wish' would be to preserve my capacity to hit the reset button. That if I lost control of the device or in any way become harmed it would reset for a band of time in which I could make assessments. Then I could reset any result that passed the automatic reset criteria with my own judgement. And the virtual time versions of me that resulted would veto the more unfavorable outcomes.
@Vidiri ปีที่แล้ว ⁺⁶
@@zotaninoron3548 So the entirety of a human morality, in other words?
@CaedmonOS ปีที่แล้ว ⁺⁸
@@zotaninoron3548which hilariously enough would mean you wouldn't even need to make a wish
@alittlefella985 ปีที่แล้ว ⁺³
But what if you wished for the health and safety of every human and mammal in the vicinity?
@CaedmonOS ปีที่แล้ว ⁺¹
@@alittlefella985 just by random chance because of quantum jiggling everything in the area is cryo Frozen
@Kazemahou ปีที่แล้ว ⁺³⁶
Somebody has never played D&D with a DM who liked evil genie adventures. "I want my mother to be rescued from the burning house she is currently inside in such a way that she arrives within four feet of me, within a space of time no greater than ten minutes, in a condition which is healthy, untraumatized, undamaged, and safe, and where her life and health expectancies are not shortened or affected in any negative way whatsoever - additionally, no other people or pets are to be harmed, or suffer any personal cost or trouble of any sort, in any way whatsoever, during this rescue."
@gremlinswrath2822 2 หลายเดือนก่อน
Hey you got it Chief!
Poof 🫰☁️
Put her in a Gem, here you go.
She's in a safe and unbothered form of stasis untill you can get her out.
Good luck with that!
@CoolKat-g1z หลายเดือนก่อน ⁺²
Let me guess the house was a sentient being who feels pain and is now evil and ploting a revenge
@LokiKhane 16 วันที่ผ่านมา
She appears for a brief second. Then disappears back into the building.
Her life expectancy wasn't limited at all. She was already going to die in the building.
@jackdoyle5108 ปีที่แล้ว ⁺⁵⁹
"You have no idea how much difficulty we go through trying to understand your human values."
/人 ◕ ‿‿ ◕ 人\
@erikburzinski8248 ปีที่แล้ว
Hello kyubey I wish for the ablity to grant anyone the ablity to choose there physical age and when they choose they will become that age over a period of 3 months through semi natural processes. Completely safe and unharmed with there body exactly the same as it was at the selected age. (How does it go wrong)
@gsilva220 9 หลายเดือนก่อน ⁺¹
@@erikburzinski8248 It might go wrong if people lose memories, or if the "semi natural processes" turn out not being so natural...
@axelinedgelord4459 7 หลายเดือนก่อน
it’s actually funny in retrospect because kyubey grants wishes exactly as the contractee requests, he just manipulates them into making one against their better judgement, often meaning the puella magi just didn't think it through. he doesn’t tell them that they become the incubators’ livestock, undergoing cruelties with no bound.
@Sorain1 7 หลายเดือนก่อน ⁺¹
@@erikburzinski8248 It works fine, as it provides more benefit to Kyubey's kind then detriment, after all, they get so many more magical girl candidates that way.
@koishily 7 หลายเดือนก่อน
was looking for the PMMM comment
@lolishocks8097 ปีที่แล้ว ⁺⁴³
I was actually thinking about a story for an episode with a device exactly like this and it just went absolutely bonkers. With just the right understanding of reality someone with a device like this could quickly attain godly powers. Also, it ended with the biggest prank in the universe. There is a lot of things you could do relatively safely. A lot safer than living through them yourself.
@frimi8593 ปีที่แล้ว ⁺¹⁰
it reminds me a lot of of the concept of "temporal reverse engineering" from hitchhiker's guide to the galaxy, wherein in addition to there being three spacial dimensions and a temporal dimension, there is also an axis of probability which some devices can observe through and traverse. The process of temporal reverse engineering essentially involves the user making a wish at which point the machine which can perfectly observe the entire universes on all 5 axes (called the new guide which was developed to sell the same copy of the hitchhiker's guide to the galaxy to the same family in infinite probable universes, thus generating infinite income at the cost of only one book) goes back in time and shifts the timeline along the probability axis at various key points to make it so that the wished for event already occurred. The new guide is observed to act like the safe genie in that it already knows what the user wants/needs and already made it happen, such that the current user never experiences misfortune... until they do and the guide is taken by a new user. In fact, each time it helps its current user, it's actually playing out a longer scheme which involves itself trading hands to fulfill the task originally set out for it, which is to destroy the earth in all realities. The destruction of the Earth is a highly uncertain event that happened in the main timeline we follow throughout the series, but not in every timeline. Because its a highly uncertain event, looking down the probability axis shows a series of timelines alternating whether the earth is there or not. Each time the new guide swapped hands and helped its new user, it was simply ensuring that that user would end up in the right place at the right time later down the line for there to be absolutely no trace of the earth left in any timeline.
@cewla3348 11 หลายเดือนก่อน
@@frimi8593 amazing book series!
@NagKai_G ปีที่แล้ว ⁺³⁰
The phrase "I wish you to do what I should wish for", for as many flaws and technicalities as it may hold, really sounds like one of the best wishes a person could make
@Prisal1 ปีที่แล้ว ⁺⁵
is it up to the thing to decide what you should wish for
@bitrr3482 ปีที่แล้ว ⁺⁴
@@Prisal1 and to find out what you should wish for, it reads your mind, and what you want. it now contains all of your morality, and knows what to wish.
@BayesianBeing ปีที่แล้ว ⁺⁷
@@Prisal1that's the thing. A good genie is only good when its goals and values are fully aligned with yours. So it knows exactly what you will wish for
@NoxysPlace ปีที่แล้ว ⁺¹
If you ask for that, the machine will pick a wish you could have made from rand(1^infinite) cause you never defined a scope.
You will most likely get your mom out safe and sound, but who knows what else might happen.
@cewla3348 11 หลายเดือนก่อน
add a clause that says "that gets my mother out with minimal harm done to anything whatsoever", just to be sure. you now rule out all possibilites that end in death.
@bennemann 11 หลายเดือนก่อน ⁺⁴
Eliezer Yudkowsky (the author of the text of this video) wrote an incredible 133-chapter long fanfic called "Harry Potter and the Methods of Rationality", set in an alternative universe where Harry has the I.Q. of a gifted genius and solves many of the wizarding issues with logic rather than magic. I cannot recommend it enough, it is probably the best derivative work of Harry Potter in existence! I read a couple years ago and I still think about it frequently.
@SpathiCaptainFwiffo หลายเดือนก่อน
I have a feeling this has something to do with eggs
@zygfrydmierzwinski6041 ปีที่แล้ว ⁺¹³
Animation quality grows exponentially from video to video, and I love it.
@joz6683 ปีที่แล้ว ⁺²⁷
This channel never ceases to amaze me. The depth and breadth of the videos are phenomenal. The videos cover subjects that I did not know that I needed. Thanks to everyone involved for your tireless work.
@yaafl817 ปีที่แล้ว ⁺³⁴
To be fair, as a programmer, I'm pretty sure a simple enough algorithm could still give you a good enough result, or at least tone down the amount of possible results enough for you to pick one. Yes the results are always infinite, but you can sample them by outcome distance. If you like one, or like some particular property of one, you could extract them and go through an iterative process to find a solution you're happy with.
Basically, a wish search algorithm.
@Zippyser ปีที่แล้ว ⁺³
That my friend is thinking with space rocks well done sir. One often forgets about such elegant solutions.
@sophialaird6388 ปีที่แล้ว
E.G, “Keep my mother alive for as long as possible”?
@ultimaxkom8728 ปีที่แล้ว
@@sophialaird6388 With the original concept: your mother would then have quantum immortality, since _"for as long as possible"_ points to infinity. Also, what is _"alive"_ anyway?
@sophialaird6388 ปีที่แล้ว ⁺¹
@@ultimaxkom8728 that could be true, but it’s a lot easier to make “live for as long as possible” something you can live with than “get your mother as far away from the building as possible”. The original goal in the video is misaligned.
@RobbiePT 11 หลายเดือนก่อน
Exactly, there's a spectrum between horrible outcomes and a perfect outcome full of pretty decent outcomes. Like and 80/20 rule of wishes. Get 80% of the utility of a perfect wish for 20% of the effort. Really, probably like a 0.01/99.99 (or even more extrem) rule in this case considering the difficulty of encoding or learning "an entire human morality"
@MrBotdotnet ปีที่แล้ว ⁺²⁴
genuinely a work of art, the animations and writing are top tier and the entire premise is really what i think the world needs to be thinking about right now given current events :|
Thanks for all your great work
@matthewgamer1294 ปีที่แล้ว ⁺¹⁰
There's a Simpson Treehouse of horror episode, where homer asks for a turkey sandwich in detail, so it is a "wish that can't possibly go wrong" and than the meat is dry. No wish is safe.
@zotaninoron3548 ปีที่แล้ว ⁺²¹
My instinct reaction about a third of the way through the video is to ignore the mother and focus on guaranteeing my capacity to use the reset button. It would automatically reset if I lost that capacity, and I could then reset any outcome in which wasn't aligned to my interests.
@4dragons632 ปีที่แล้ว ⁺²
The outcome pump will kill you any time you reach for the reset button, because futures where you press it are the worst possible futures for the pump so it will do anything to pick a future where you dont press it.
@Vidiri ปีที่แล้ว ⁺¹¹
@@4dragons632 They mean making their wish something like "I wish I retained full power to push the regret button" so that the pump is forced to pick a future in which the maker of the wish would not want to press the regret button, despite still being fully able to.
This would ensure any future where you physically could not push the regret button was avoided, as well as futures bad enough to make you press it. It's essentially the only wish you could make that would ensure the outcome would align with the entirety of your morality (at least as far as your perspective is concerned)
@4dragons632 ปีที่แล้ว ⁺⁴
@@Vidiri It doesnt accept english inputs though, you'd need to somehow get the 3D scanner to include information about you being able to press the regret button and still not pressing it. Still though, you would hope that would be possible and inbuilt into the next model of the pump.
@zotaninoron3548 ปีที่แล้ว ⁺⁸
@@4dragons632 The video includes examples of addressing a multitude of contingencies that you could try to import in a futile attempt to address all possible wrong outcomes, including the physical state of the mother. I would assume defining yourself as unharmed, unrestrained and capable of performing a specific gesture on the side of the device prior to a time limit or a reset occurs automatically would be possible.
This is just me thinking about it offhand, I am curious what holes people could punch into this solution. Because I'm more inclined to think I'm missing something than that I've found a complete solution to the analogy given.
@4dragons632 ปีที่แล้ว ⁺⁸
@@zotaninoron3548 In that case maybe the pump is smashed flat by the falling beam instead of you. Or you suffer a stroke that puts you in a permanently happy hallucination. Whatever it takes to not have the button get pushed.
@XOPOIIIO ปีที่แล้ว ⁺⁶⁷
ChatGPT seemingly shares the ethics of some part of humanity, but it's an illusion, in reality it only values the successful prediction of the next word.
@frimi8593 ปีที่แล้ว ⁺⁶
well some of that is artificial/external, as it has preprogrammed blocks that prevent it from saying particularly disagreeable things. However, the other thing to consider is that chat GPT can also be convinced to reach ethical conclusions most people would flatly reject outright. This is because chatGPT effectively takes any ought statement you make as a first principle
@XOPOIIIO ปีที่แล้ว ⁺¹
@@frimi8593 It's because it sees where you're going and trying to go along just to make sure that it has more chances to predict the next word.
@Kycilak ปีที่แล้ว
But are we sure that the ethics (or indeed any part of mind) can't be deconstructed the same way?
@XOPOIIIO ปีที่แล้ว
@@KycilakWhat do you mean?
@Kycilak ปีที่แล้ว
@@XOPOIIIO With enough knowledge about an organism (human), you may be able to formulate its values such that they seem as absurd as " the successful prediction of the next word".
@ethanstine426 ปีที่แล้ว ⁺³⁷
I kinda feel bad for laughing through a not insignificant portion of the video.
@Sparrow_Bloodhunter ปีที่แล้ว ⁺⁴
"I wish that you would do what I should wish for." is such an incredible genie lifehack.
@namename1302 ปีที่แล้ว ⁺¹⁰
I know this video is about AI alignment, but I think it introduces the basis for a problem that applies to other humans as well (and, in doing so, reflects back on the entire concept of AI alignment).
The outcome pump obviously doesn't 'get' your human wishes in the same way another human would. If you asked a HUMAN to 'get my mother out of that burning building', they would almost certainly come up with a solution that adheres at least somewhat to your set of preferences. I think it's pretty obvious that this is because the outcome pump lacks any cultural context. Most people share a pretty large subset of general guidelines with most other people- including 'i would prefer if my parents lived longer rather than shorter, all else being equal', among many, many other guidelines, which are intuitively grasped in order to realize the real wish: some nebulously-defined idea of 'rescue'.
However, the argument put forward in this video remains valid. There IS no safe wish short of explaining your entire morality and value structure. This applies even to requests with no particular guarantee of success - as with AI, and as with other people. Asking for help from another person is, in theory, exactly as poorly defined as asking for help from an AI- there's just more cultural context to clue fellow humans in.
Ultimately, this reflects on the AI alignment issue- Yes, it's infeasible to comprehensively explain to an AI exactly what moral choices you want it to make every single time. But, it's at least equally infeasible to explain the same to another human. In the video, you note that an outcome pump which IS somehow perfectly aligned to you, would need no instruction at all. Putting aside the possibility of a human failure in reasoning - which would hardly be a point in the humans' favor anyway - the same is true of a human being who has somehow been convinced to agree with you on literally every single issue of ethics and motivation - which is arguably an even more absurd concept.
To be clear, I don't personally trust AI very much (as a non-expert). But I think the suspicion people reasonably give it is revealing, given that human beings are equally incomprehensible, while also being more prone to logical mistakes and conflicts of interest.
@isaaclinn2954 ปีที่แล้ว ⁺³
One of the reasons I loved HPMOR was because Harry immediately tried to use the time turner to factorize the product of two large primes, the failure of which gave us a reason why he can't find the solution to any problem whose solution is verifiable and whose search space can be ordered. Eliezer is an excellent author.
@Egg-Thor ปีที่แล้ว ⁺⁴
This is one of your best videos yet! I'm so happy I subscribed to you back when I did
@enjoy_life_88 ปีที่แล้ว ⁺⁴
Wow! I wish you millions of subscribers, you deserve them!
@granienasniadanie8322 ปีที่แล้ว
RAndom glitchin youtube's algorithms gives them million subscribers but glitch is quickly detected and channel is took down by youtube.
@theeggtimertictic1136 ปีที่แล้ว ⁺⁶
This animation gets the point across very clearly and deals with what could be a heavy subject in a light hearted and entertaining manner ... well done 👏
@alexwolfeboy ปีที่แล้ว ⁺²
Oh my Dog, I adore the animation on the video. I know it was all talking about your grandma dying... but the little paw was too adorable to be sad!
@onedova2298 ปีที่แล้ว ⁺¹⁴
We play d&d and we learned that wishes always have a catch if you don't choose your words wisely.
@zacharyhawley1693 ปีที่แล้ว ⁺¹
In D&D Wishes are best to replicate other spells. Especially ones with long casting times or other annoyances. The monkey paw thing was supposed to be optional.
@onedova2298 ปีที่แล้ว ⁺²
@@zacharyhawley1693 I didn't really think about that. I guess we used the teleportation spell more than anything else without knowing.
@zacharyhawley1693 ปีที่แล้ว
@@onedova2298 You were using it right RAW. The monkeypaw thing is only supposed to happen if you try to exceed what a 9th level spell can reasonably do.
@DavidJohnsonFromSeattle ปีที่แล้ว ⁺²⁰
Literally, everything you just said equally applies to the act of conveying a message accurately to another person. You aren't talking about wishes or magic powers but actually about communication. If the communication is perfect, the wish will be too. Which incidentally solves this genie problem. You don't know need a genie that knows your wish before you make it and so grants it automatically. You just need another person with enough of a shared context that you can communicate with them fairly effectively.
@hiteshadari4790 ปีที่แล้ว ⁺³
What the hell, that was brilliant animation and great narration, you're so underrated.
@ErenYeager-xk3cy ปีที่แล้ว ⁺¹
What a freaking god damn amazing video. The soundtrack, the narration, the animations, the script, the editing.
Abso-fucking-lutely perfect!!!
@newhonk ปีที่แล้ว ⁺³⁷
Extremely underrated channel, keep it up! ❤
@cefcephatus ปีที่แล้ว ⁺¹
This is phenomenal. The phrase "I whish for you to do what I should wish for." Is powerfu. And what that unsafe genie we're talking about? Yes, it's just us.
@Yitzh6k ปีที่แล้ว ⁺¹⁰
Imagine instead of an emergency reset button you were to have a "continue" button with a preset timer. If you haven't pressed continue after the time has elapsed, all is reset. This uses your own brain as the judgement system, so it is "safe"
@terdragontra8900 ปีที่แล้ว ⁺⁹
something else could push the button, if it has to be your finger your finger could be ripped off, if your health cant be harmed you could press it on accident, if you forbid such accidents im really impressed you programmed it to be able to tell what an accident is
@oasntet ปีที่แล้ว ⁺⁶
More importantly this is just equivalent to an AI that has to check every decision with a human. The probability pump is a rough analogy for AI, but the reset button is a human-in-the-loop system, which an AI cannot be to be useful.
@cewla3348 11 หลายเดือนก่อน
@@oasntet it remembers previous, denied things and avoids stuff like that. Are you really calling the GPTs not ai?
@cewla3348 11 หลายเดือนก่อน
@@terdragontra8900 if you do not think about pressing the continue button, it restarts. if you lose thought, it then restarts. if you die, it then restarts.
@terdragontra8900 11 หลายเดือนก่อน
@@cewla3348 at the moment, we can't reliably scan someones brain and measure if theyve thought about something (even though theres interesting brain scan stuff that can kind of do). also, even if we solve that, that doesnt prevent cases where you are manipulated into thinking everything is fine even though it absolutely isn't if you are thinking straight and have all the information (such as a horrible thing happens and is completely hidden from you, or you are drugged in some way, or it feeds you "propaganda" and changes your mind)
@Dawn-Shade ปีที่แล้ว ⁺¹
I love how the thumbnail has reflection that is different for each eye glasses, it actually creates 3D effects when viewed in crossed-eye!
@jonhmm160 ปีที่แล้ว ⁺¹⁷
This shows very well the challenges of alignment from an individual perspective, but for the human race as a whole it's even worse/harder. I don't think there is a single person in the world I would be ok with giving Superintelligence like powers. Even tough he would still be aligned with himself, it's a big gamble that it would create a great society for everyone else. So in essence we need a superintelligence to have some sort of super morality which is aligned with the entire world if one such thing even exists.
@Woodledude ปีที่แล้ว ⁺¹
That, or just create a diverse array of superintelligent entities using human minds as bases for each one. That way we're not picking *just one person,* but hopefully representing a good breadth of hunanity.
@conmin25 ปีที่แล้ว ⁺⁷
@@Woodledude But then we would have the same problem humans have. That we don't agree and sometimes don't get along. Even the best intentioned humans can spark conflict with there differing beliefs and opinions. If we just put a variety of the best human moralities (if such a thing can be jugged) in these AIs then they would also argue and spark conflicts.
@Woodledude ปีที่แล้ว ⁺⁵
@@conmin25That's much better than there being no argument about an objectively terrible direction for humanity. No argument with enough power behind it to matter, anyway.
It doesn't really stop humanity going in a terrible direction, but it does at least make it less likely - At least, given the constraint that we MUST construct at least one powerful AGI.
Having the same problems we do today, but on a greater scale of intelligence, is better than having an entirely novel problem on top of all the other ones - That being an effectively omnipotent dictator.
And if we're actually careful about our selections, MAYBE we'll actually get a group of human-base AGIs that are actually trying and succeeding in doing good in the world.
AGI research is basically a field of landmines, where the goal is to find one that's actually a weight-activated chocolate fountain that turns off all the other landmines.
It's, uh... Not pretty.
The only real option is proceeding with incredible caution, and being certain of everything we do before we do it.
@supernukey419 ปีที่แล้ว
There is a proposal called coherent extrapolated volition that is essentially a supermorality
@terdragontra8900 ปีที่แล้ว
sometimes humans in "conflict", in the broad sense, "fight" in a way that doesn't involve, you know, death and other things we'd definitely like to avoid. its not necessarily bad if the AIs compete with each other, wrestle for influence, etc, if there's a system where AIs are more likely to "win" if we like them more. but, i have no idea if that's a feasible type of system, it may not be. @@conmin25
@nikkibrowning4546 ปีที่แล้ว ⁺¹
This is why I like the phrase, "Without otherwise changing the state of (person) or any other being, do thing."
@t_c5266 ปีที่แล้ว ⁺⁴
First wish would be something along the lines of "I wish the intention of my wishes is explicitly understood and my wishes are fulfilled as I intend."
There you go. Wishes are now fixed
@gabrote42 2 หลายเดือนก่อน
Problem is that it can shake up your intentions for a brief period, changing your goals so that you intend something more probable, and then leave you to regret it later, or your intention can be accurate at the time and be regretted down the line when it changes based on new info.
@t_c5266 2 หลายเดือนก่อน
@@gabrote42 no it can't. It doesn't get to modify your intentions
@gabrote42 2 หลายเดือนก่อน
@@t_c5266 Which part says it can't? Have you ever heard of reward hacking? Or convergent instrumental goals? If the objective is fulfilling the intentions of a human, making those intentions as simple and fulfillable as possible seems much easier than modifying the greater universe to reach some state. If your intention is everything you think of when saying "my goal in life is curing cancer", it would be much easier to device some argument or memetic hazard that caused you to instead intend "my goal in life is to drink one coca cola each month until I die of natural causes". I would totally do that if I was the outcome pump. And if you patch that out, we get closer to the lookup table problem when I give you another such counterexample.
@AltDelete ปีที่แล้ว ⁺²
THANK YOU. AI is whatever, what I'm trying to do is be ready with the right wish parameters for a potential genie scenario, and this is a good angle. Maybe the best angle I've heard. Thank you.
@0ne0fmany ปีที่แล้ว ⁺⁴
You know, If you live in a wheelchair, but you mum lives in a house with stairs only...
@youtubersingingmoments4402 11 หลายเดือนก่อน ⁺²
While I love the thought experiments and multiple entertaining examples, I could have just watched an episode of The Fairly Oddparents. Like half of the episodes' plots consist of Timmy making a vague wish that has unintended consequences, and the story arc is him undoing his mistakes. The whole moral of that show is "be careful what you wish for" lol.
@HansLemurson ปีที่แล้ว ⁺³
I _WISH_ that this video becomes famous.
@Kankan_Mahadi ปีที่แล้ว ⁺²
Augh~!! My brain~!! Too much complexity~!! It hurts~!! But I absolutely love the animations & art style - so adorable.
@imangellau ปีที่แล้ว ⁺³
Absolutely love the production of this video, including the music and sound effects!!✨
@Julzaa ปีที่แล้ว ⁺²
Your production quality is phenomenal, you are amongst the few creators on youtube I really wish they'd have 10, 20x more subscribers! And the team behind this is huge, can't say I'm surprised. Props to all of you 👏
@smitchered ปีที่แล้ว ⁺¹¹
I like how you guys, and Eliezer, and the general LW community are going the hard route for convincing people of AGI's dangers. Not going the easy route by eg mentioning a terminator-style apocalypse or saying that we should regulate globally, because China or something. I get that this makes sense to divert as much attention to alignment, true, technical alignment, as possible, but I imagine this is also the natural consequence of raising oneself to be loyal to good epistemics, instead of beating the other tribe at politics or something. You point out the real problems, which are hard to understand, inferentially far away, and weird, out of the Overton window. Good job, as always!
@mathpuppy314 ปีที่แล้ว ⁺¹
Wow. This is extremely well made. One of the best videos I've seen on the platform.
@vanderkarl3927 ปีที่แล้ว ⁺¹³
Are we even sure that a genie with an entire human morality would be safe? Whose?
If not, are human moralities coherent enough to take a weighted average or union or what have you? I imagine we'd all get along a lot better if that was true.
@AlfiePT ปีที่แล้ว ⁺²
Just wanted to say the animation in this episode is amazing!
@ianyoder2537 ปีที่แล้ว ⁺⁶
In my own personal stories, genies, like all other magical creatures and phenomena still have rules they must follow. In the genie's case it's the law of conservation: matter, energy, and now ability and ideals cannot be created or destroyed only transferred from one form to another.
So hypothetically you say "I wish I had a beautiful kind loving girlfriend." Well the genie can't simply create a another person, so the genie must then find a woman who's beautiful and kind to love you. However the genie cannot create the feelings of love, so it takes the feelings of love out of some one else, modifies sed feeling to apply to you, and implants it into sed woman. Well were did this stolen love come from? Well the genie will take the path of least resistance and find the closest relationship to draw from.
So in essence. In order to have a relationship of your own the genie ended a relationship of some one close to you.
@AlexReynard 7 หลายเดือนก่อน ⁺¹
Tons of effort and thought put into the basic premise, without ever realizing that the basic premise is stupid.
No product that predicts it's users wants is going to have only ONE playtester, who is expected to teach it everything perfectly.
The answer to the problem portrayed in this video is that, you have *shitloads* of people interact with the outcome pump in controlled simulations, long before it is ever allowed to change reality. Which is *exactly* what we are *already* doing with proto-AI programs now.
@spicymeatballs2thespicening 6 หลายเดือนก่อน
Well put
@tornyu ปีที่แล้ว ⁺⁹
Honest question: could you make successively better outcome pumps by starting with a weak one (can reset time n times per wish), and then use it to wish for a new outcome pump that is 1. more moral and 2. more powerful (can reset time n+1 times), and repeat?
@conmin25 ปีที่แล้ว ⁺⁸
You would still have to define for it what is more moral or not so that it knows if it is getting closer to that goal. Which if you can define all morality in machine language you're already done.
@cewla3348 11 หลายเดือนก่อน
@@conmin25 you decide if it's moral or not?
@Jellyjam14blas ปีที่แล้ว ⁺²
Holy moly! the animation is so amazing! And the discussion about wishes is really well thought out and nicely presented :D
@gabrote42 ปีที่แล้ว ⁺⁴
So many of these is great, and while I miss Robert Miles' standalone content, this is not too bad a substitute
@ambrosia777 ปีที่แล้ว ⁺¹
Outstanding episode. From animation to story, you've done amazingly
@SatanRomps ปีที่แล้ว ⁺³
This was wonderful to watch
@Bellonging 6 หลายเดือนก่อน ⁺¹
Me and my brother used to play a game when were were very bored and very young where you'd try to touch the other's nose. They'd fight you off, and then declare a new rule. So maybe "You can't use your nails", or "You can't move your body over this mark". We would expect the game to be quick, because it's always possible to declare a rule like "you can't touch me" or "don't move" and end the game, but also the compounding of rules became very restrictive. AND YET because we were still bored, the game would continue for a very long time sometimes. There was almost always a way to avoid or misinterpret a rule.
In addition, sometimes we would try to break rules, but surreptitiously so the other player didn't notice. Like distracting them to make them miss another player in their blind spot, when another player had explicitly been banned. It only counted if you could see the broken rule, so it was a dirty win but still a win.
I figure lots of other people played a similar game but I don't know if it has a name haha
@jansustar4565 ปีที่แล้ว ⁺⁶
(as mentioned in another comment) use yourself as the evaluation function).
Option 1:
After N years at most, determine the satisfaction of myself (and maybe other people I care for) about the outcome of the scenario.
The only problem with this is if the insides of your brain is modified to adjust the evaluation function, which isn't all that nice, but you can get away with adding a potential test to see how close your mentality is to the mentality of before. This still has some problems, but is way better than the alternatives.
Option 2:
On first activation: Change the events in a way such that the second time I activate the machine, I will choose the evaluation function I would be the happiest with with my current state of mind. Not activating it a second time (within a timeframe?) is an automatic reset.
With this, you bypass the entire problem of "There is no safe wish smaller than an entire human morality" by encoding the entire human morality inside of the eval function.
@vakusdrake3224 ปีที่แล้ว ⁺⁴
Given this scenario I'm not really sure how this avoids it just granting wishes in ways that lead to the button being pressed twice without your involvement. The point the video made about it ensuring you don't press the regret button generalizes to most other similar sorts of measures.
@jfb- ปีที่แล้ว
your brain is now in a jar being constantly flooded with dopamine.
@celestialowl8865 ปีที่แล้ว ⁺²
Whos to say you understand your own mind well enough to know that the outcome produced to maximize your own happiness will be the outcome you desire in the instantaneous moment?
@veritius340 ปีที่แล้ว ⁺²
The Outcome Pump not checking to see if the user is incapacitated and unable to press the Regret Button is a pretty big design oversight.
@fluffycat679 ปีที่แล้ว ⁺⁷
Now, I know the Outcome Pump has no ill intentions. It can't, and isn't, actively trying to upset me, and its simply a matter of how I use it, and its illogical to hold against it the unsatisfactory outcomes that result from my misuse of its power. But, with all that being said... It blew up my house. So no, we are not friends, it killed my mother.
@SL-wt8fm ปีที่แล้ว ⁺¹
"Get my mother here right next to me in the same state as 10 minutes ago in the next 5 seconds that ensures her safety and wellbeing for the next 5 years, and do not cause any harm to any being larger than 5mm" is a pretty good wish
@Cqlti 11 หลายเดือนก่อน ⁺²⁷
bro should have just wished he could walk
@NickTaylorRickPowers 8 หลายเดือนก่อน ⁺⁶
Didn't specify he couldn't walk using only his hands
Now their both fkd
@BrunoPadilhaOficial 7 หลายเดือนก่อน ⁺⁶
Didn't specify how fast.
Now he can walk at 1 meter per hour.
@theallmemeingeye5927 ปีที่แล้ว ⁺¹
I'm so glad you made this, it's one of my favourite stories by EY
@morteza1024 ปีที่แล้ว ⁺³
If the device is complete, your brain can be it's function.
At least three things are enough but it can be optimized more:
It needs to store a specific time in order to reset to that time. This number can be manually adjusted.
Reset button so if I don't like the outcome I press the button.
Auto reset 100 years after that specified time on the device so if I somehow died or was unable to press the button or change the time on the device it will reset automatically.
@minhkhangtran6948 ปีที่แล้ว ⁺¹
Wouldn’t that just trap you into how many years you live + 100 years loop, more or less? That sound like hell
@gamingforfun8662 ปีที่แล้ว
You would need to add a way yo prevent the loop
@morteza1024 ปีที่แล้ว
@@gamingforfun8662 Move the reset time forward.
@morteza1024 ปีที่แล้ว
@@minhkhangtran6948 You won't remember any of them because they're the same as not happened and you can move the reset time forward.
@gamingforfun8662 ปีที่แล้ว ⁺¹
@@morteza1024 stopping the flow of time to just live multiple lives I don't even remember doesn't sound so good
@AGoodGuyOnTheInternet 11 หลายเดือนก่อน ⁺¹
I'd wish for myself to say "Yes, this is what I wished for" within the next ten minutes.
@Flint_the_uhhh ปีที่แล้ว ⁺⁴
This reminds me of Fate/Zero.
⚠️⚠️SPOILER!!!! ⚠️⚠️
The main character was a contract killer who has seen the worst sides of humanity - wars, famine e.t.c
At the conclusion of he story, he obtains a wish granting device and makes a wish to save all of humanity from these problems.
He doesn't know how to save humanity, but his train of thought is that since it is a wish granting device, it will surely know of a way to accomplish this goal.
However, since he himself cannot fathom of a way to save humanity and was simply hoping the device would perform a miracle, the device tells him that it will grant his wish through methods he can understand.
The device then decides to destroy humanity, since it's technically a way to save humans from all our problems, and also a solution that he can fathom.
@MisbegottenPhilomath ปีที่แล้ว ⁺²
I like the message of this video, but I think it's a bad example because the answer is pretty clear. "I wish for her to be saved in a manner such that death is not a consequence and destruction is minimalized"
@guillermoratou ปีที่แล้ว ⁺⁵
This is mind boggling but also very simple to understand 🤯
@218Ns ปีที่แล้ว ⁺¹
THE EFFORT IN THE VIDEO
1 minute in and already new subscriber
@celestialowl8865 ปีที่แล้ว ⁺⁹
An outcome that is "too unlikely" somehow resulting in an error implies a solution to the halting problem!
@kluevo ปีที่แล้ว
Alternatively, running through the scenarios of something 'too unlikely' causes the outcome processor to overheat and crashes. The program isn't halting, it just fried the computer.
Perhaps another processor sees the outcome processor crashed/is non-responsive and sent an error code?
@celestialowl8865 ปีที่แล้ว
@@kluevo Maybe, but then you're always at risk of frying the computer because you can never know which wishes are infinitely unlikely lol
@tornyu ปีที่แล้ว
I interpreted it as: the outcome pump can reset time a finite number of times per request. More powerful pumps can reset more times.
@sarahlynn7807 5 หลายเดือนก่อน ⁺¹
Finally a good video on the control problem.
@smileyp4535 ปีที่แล้ว ⁺⁵
I always thought the best wish was "perfect knowledge and ability to make and fullfil the best possible wish or wishes from my perspective accross all time and outcomes" or "the ability to do anything" and essentially become god
I'm not sure if those actually are the best wishes but I've put a loooot of thought into it 😅
@minhkhangtran6948 ปีที่แล้ว
Hopefully what’s best for you isn’t accidentally apocalyptic to everything else including your mother then
@CaedmonOS ปีที่แล้ว
@@minhkhangtran6948unlikely as I assumes he probably doesn't want a reality that would harm his mother or cause an apocalypse
@michaeltullis8636 ปีที่แล้ว ⁺⁴
Phillip K. Dick said that "For each person there is a sentence - a series of words - which has the power to destroy him." There almost certainly exists an argument which would persuade you to become a libertarian, or a communist, or a Catholic, or an atheist, or a mass murderer, or a suicide. If you gained "perfect knowledge and the ability to make and fulfill the best possible wish or wishes from your perspective across all time and outcomes", your perspective would change. What would it change to? I figure it must depend on what order you hear all the mysteries of the universe. And if the values you have as a god depend on the details of your ascension, a hostile genie could just turn you into the god they want around (or a god that unmakes itself).
@dodiswatchbobobo ปีที่แล้ว ⁺¹
“I wish to gain the thing I am imagining at this moment exactly as I believe I desire it.”
@AleksoLaĈevalo999 ปีที่แล้ว ⁺³
I love how the firefighter was so tall that he had to duck under the door frame.
Hot < 3
@theeggtimertictic1136 ปีที่แล้ว ⁺²
Yes I noticed that too... a nice detail 😊
@WuffRobotica ปีที่แล้ว ⁺¹
The thought cannon from adventure time is an example of how a safer class genie could be unsafe, if it just reads your mind, the wish could be an erratic thought
@ChaiJung ปีที่แล้ว ⁺⁸
The biggest problem with all of these Monkeys paw type scenarios are the assumption that the Djinn or wish granter has a condition where they only understand literalisms and are buttholes. If I go to carpenter and want to buy a chair, I'm going to get a chair and it'll be within the general understanding of a chair and NOT some bizarre addition or concept outside of what's understood to be a chair. If I'm interacting with a powerful wish granter (who doesn't have the ability already to understand normal language), I'd likely get my wish
@IgnatRemizov 3 หลายเดือนก่อน ⁺²
"ability to understand normal language" means it has a morality engine. We can assume that the average genie does not have any morality and therefore will take your query in the most literal way possible.
@axelinedgelord4459 3 หลายเดือนก่อน ⁺¹
a chair is a chair- no other morals or meaning with it, but wishing for one still might not work out.
for instance, you never specified when, where, and for how long, so towards the end of your life you could own a chair.
or maybe one is spawned in somewhere inconvenient, or maybe you only have it for a brief moment before it disappears.
i do not believe you understood what the video tried to point out. as shown in the video, "get my mother out of that vague building" is understood just as you believe it would be, considering how your mother had indeed been taken out of the building. it found the easiest way to achieve the wish, and had no reason to do it any other way; after all, its only goal is to achieve the conditions of the wish it was given, and had no reason to follow any other path.
wish for gold and there's nothing saying the method you gain it wouldn't kill you. for something without morality like the hypothetical wish machine in the video, it's impossible to get it to not do everything you don't want it to do. all it does is manipulate probability to reach a goal you give it.
@mid-boss6461 ปีที่แล้ว ⁺¹
10:45 Explosion was so loud that it was heard from neighboring timeline.
@atomicflea4360 ปีที่แล้ว ⁺⁴
I understood every thing and totaly not going to have to reaserch thearetical phisics
@Nu_Wen ปีที่แล้ว ⁺²
in video games, there are needs you have to pay attention to. needs for safety/health, food, drink, sickness/medicine, home, money, etc. if it's registered that your stats should be "xy z", than making a wish through this system should theoretically be easier than trying to predict every scenario. since it's the needs we automatically think about, perhaps it's the needs we should be gearing it towards? if you wished for your mother to be removed from the house and returned to safety, the only list it would have to look at is if the potential answer is conflicting with the needs. chucking her out the window would distrupt the need for safety for example, so it couldn't be a usable answer. picking her up and bursting through the wall wouldn't work either because that would disrupt our needs for safety, cleanliness, budgeting/money/home etc. it would have to come up with an answer that didn't conflict with our needs and while we have a lot of needs, our list of needs is significantly smaller than our ever growing list of potential outcomes.
@conmin25 ปีที่แล้ว ⁺¹
The video already addresses this with the calculator example at 5:50. That you would need an algorithm that takes into account of all the needs and values of the person using it. And to understand all your needs and values is to essentially understand your entire morality.
@marioxdc123 ปีที่แล้ว ⁺¹
the contents of the video are very good, and the animation is fantastic!!!
@vanitythenolife 11 หลายเดือนก่อน
Never thought id watch an 11 minute video going in depth to human morality and genies but here I am
@darkguardian1314 ปีที่แล้ว ⁺¹
This is like The X-Files episode "Je Souhaite" about a Genie that grants three wishes but it's never exactly what is wished for.
Peace on Earth...the Genie deletes all humans but you from Earth.
Wishing someone to be quiet, the Genie removes their mouth.
Ask for a sandwich and the Genie gives you an old moldy one.
Want to be invisible...the Genie grants wth wish but only you are invisible and not your clothes and because no one sees you, you're in danger of being hit by a car.
Each time, the Genie blames you for not being specific with your wish because English is an ambiguous language.
@serbanstein 2 หลายเดือนก่อน ⁺¹
The regret button reminds me of a certain game about space exploration and time loops... Which means that, assuming nobody remembers the loops, then no matter how awfully you formulate your wish, you will always get what you want, that is, you will stop resetting and forgeting in precisely the loop that satisfies your wish fulfilment criteria.
@ChatBot-ti9pe 10 หลายเดือนก่อน ⁺¹
Speaking of wishes. I often hear "be careful with what you wish for", but for me the quote feels too vague to be an aesop, and I think it should brought to its natural conclusion: understand your actual needs. In real life with no genies, it's not about most wishes (ambitious goals) being terrible in itself, or needing to be more specific, it's about those wishes being an exaggerated attempt to get something small we feel like we were robbed of for too long.
@Twisted_Code 5 หลายเดือนก่อน
Aha, the Alignment Problem. This is a topic I think all humans, especially those currently spending time developing or using existing AR technologies, have a vested interest in studying. Thanks for the video, I wouldn't have known about the short story (really more of a narrative essay) without it, and sharing this might be more "user-friendly" then sharing the story itself for someone to read if they don't have time (or don't appreciate the time they have) to read it.
@callen8908 9 หลายเดือนก่อน
Newly discovered your productions. You excite my brain, and inspire me beyond words. I cannot thank you enough
@d5kenn ปีที่แล้ว ⁺¹
Bravo on this one. Insightful, and really encapsulates the problem in a well-articulated way.
1. Ooh, new RA!
2. Framing a wish protocol as a recursive outcome pump?!.....omgomgomg
3. Hey, that voice is the AI Safety Guy with the funny hat/hair logo!
Talk about a hat trick.
@TheIncredibleJounan 18 วันที่ผ่านมา
Currently rewatching these and I just remembered a discussion I had with a colleague.
"If there was an AI that could perfectly model everyone in a country, would democracy be needed?" This assumes that people want a democracy.
@alterego3734 11 หลายเดือนก่อน ⁺¹
There is a relatively safe wish way smaller than human morality: save the mother, while minimizing the number of futures explored (an upper bound could also be set). For example, only allow the device to pick one out of a thousand possibilities. In the worst case, it picks the worst out of a thousand, which is not that bad, as it could have happened anyways with a probability 1/1000.
A slightly more complicated but better way is to define a local distance function, and then minimize the distance from a typical future within the vicinity of the desired change. While a meaningful distance function is non-trivial to define, it does _not_ require "all of human morality". A relatively simple AI that understands which scenario is close to another one is enough.
In fact, this is how natural language works. When someone says "I want my mother to be saved", the other speaker doesn't need "human morality" to understand the statement. Implicitly, there is an "all else being equal" appended.

ต่อไป

เล่นอัตโนมัติ

How to Take Over the Universe (in Three Easy Steps)