AI DESTROYS The Centre Of The Universe | Super Mario Galaxy
ฝัง
- เผยแพร่เมื่อ 1 มิ.ย. 2024
- #ai #mario #reinforcementlearning
AI uses a variant of the Reinforcement Learning algorithm Rainbow DQN to learn how to play the Final Level of Super Mario Galaxy.
0:00 Intro
4:05 Attempt 1
6:23 A bit further
8:54 Nearly there
10:45 Training
16:34 Final AI
Also thank you to my editor for editing this video!
@benji.botterill
www.benji-bott.com - วิทยาศาสตร์และเทคโนโลยี
that lil wall jump at 15:47 was cleeaaaannn
Again at 19:21 🔥🔥👏
Super cool to see your AI beat such a complex level. It was already pretty mindblowing seeing it beat the simpler one before, but this takes it to a whole new level! I wonder if it would eventually be able to play the whole game? It would probably need years of training, but I don't think it would be impossible with where AI is getting
Really glad you liked it! Maybe eventually, as you mention that would take an incredibly long time! To get this to work however I did have to do a lot of handcrafting of rewards, so the time spent making that system would require a tonne of work
@@aitangoIt might be possible to train a 'reward network' on gameplay footage of the game, and have it output how far through the level a particular frame of gameplay is.
This is a greater testament to your ingenuity and patience than it is of what can be done with AI. You somehow painstakingly created a decent reward function for a game that was never meant to have AI playing it. Mad props man, most have given up before reaching this point.
Thanks, that really means a lot!
1@@aitango
"How I made Mario fight through the (fiery) pain? with MOAAR PAINNNN!" 😆
ahahah
Your AI plays Mario better than I do. I have always seen Mario(and many other) AIs that try to reproduce a speed run. I would like to see one that looks good while doing it. Picks the path because it takes them right over the coins, bounce off the enemy, grab the mushroom, shoot the fireball. Can you make this happen?
You could prob could reward it for picking up coins and mushrooms buts its still gonna look unnatural as its taking the path of least resistance.
Reinforcement learning is not actually a great way to make a TAS. You would think it could be great, and you can even encourage it to speed up by decreasing the reward every time step, effectively incentivizing it to finish the episode as quickly as possible to save as much reward as possible. But in reality, in an environment as complex as this, there are billions of local optima that the model will fall into instead of choosing an optimal path, and you end up with something that clearly has a better route visible to a human. You’re much better off using a modified Monte Carlo Tree Search to essentially brute force the best path, as it would be much faster.
@@yourmomsboyfriend3337there is a sort of hybrid approach called Curiosity-Driven Reinforcement Learning. A reward is given for checkpoint progress, but it is also given for encountering a previously unknown state. The main downside is that it's significantly more memory intensive than both approaches.
I've been waiting let's goooooo.
Honestly I think this is my favourite channel right now for actual demonstrations of AI learning. So few people are actually working on smaller-scale things like this for TH-cam and the like and I love seeing their progress over time.
Thanks, really great to hear! Was great to finally upload again, its been a while
After watching this AI struggle for over 100 hours on a single level, I can safely say we've found the new 3rd Game Grump
LOL this is brutal
Are there any random factors in Super Mario Galaxy or will the AI succeed by just learning one fixed sequence of inputs?
I'm not actually sure if the game has any randomness or is completely deterministic. During training I force the AI to take random actions, partially to help explore but also to prevent it from just learning a fixed sequence of inputs
It isn't particularily likely that the AI would be able to learn a fixed sequence of inputs - it doesn't really have the architecture necessary to store a bunch of data exactly and then read from that memory at specific times.
@@aitango if you use dolphin's save-state system to reset the level, then the rng is deterministically the same as the previous save-state
Amazing, absolutely. I wonder how it might perform on a level that is a bit more open, maybe even including a boss.
Or the trial stars.
Open levels are something I really don't know how it would perform on since the AI tends to be pretty reliant on having a strong reward signal, whereas open levels would be hard to have this. Trial stars are something I have strongly considered, seems like it would be fun ahah
@@aitangowhat might work for open areas exploration? I can't imagine an algorithm rewarding for linear progress would be compatible.
@@nobafan7515 a curiosity driven approach may be possible, but this would require significantly more computer resources because of the need to maintain a database of explored states, and significantly more training time as the algorithm may sometimes need to exhaustively explore everything before it knows how to progress
There's some great lessons to learn with this one. For example, if you think the most rational thing to do is to jump in a black hole instead of facing what's ahead, you should get your reward functions looked at. If you sort things out, it will get better and you can make it to the other side.
this is the most insane thing I have ever seen, no joke. Please never stop making Videos.
Really glad you liked it, thank you so much!
But can it beat Mario Galaxy 2's Perfect Run?
But can it beat Goku?
The quality of your videos have really gone up, i cant believe you dont have more views
This is dope, it shows how complex AI programming really is and you’ve clearly put a ton of work into it here.
Thanks, really great to hear
poor ai saw the lava tunnel and just thought "yeah, no"
i'm wondering if it was having a hard time on the moving platforms and other moving objects because it has no visual memory, and therefore can't determine if an object is moving or not when analyzing the frame it's currently on? maybe have a bit of extra lower res data with the last frame stored in it could fix that?
Great video! Love the new end card
Thanks!
Lets go this is the best channel ever
Maybe it would be cool to inizialize the weight of the agent by training first the model on clone behavior from like your gameplay of the game for example, maybe it's blasphemous to say ahah
I always love getting AI to learn from scratch, however learning from demonstrations does have quite a bit of research behind it and is something I've looked into since it has quite the potential upside. One problem tends to be that they need a lot of input for that to work (probably 100s of hours), so that was discouraged me thus far
Very cool! Do you think it's just memorizing the level, or does it generalize to new content?
It’s hard to say since it did have to learn to deal with lots different styles of levels for this one. I think it would struggle on a brand new level, but with an hour or so of training could easily pick it up
Do you know what the jump in progression at ~80 hours training was caused by on the graph at 17:00? Also have you considered using transfer learning from another model which has already been trained on a video feed from a game? I assume by training your model scratch that it would have to learn the low level attributes of what makes up an image before it can learn how those attributes relate to game progression, though idk much about your architecture. Interesting video as always 👌
the lil swagger at 19:20 was so good lol
The tracks you choose for the Training section are absolute bangers and fit the theme really well.
Would be cool to know how they're called tho
I was wondering, wouldn't giving all the items in a level a unique identifier then giving the AI a reward whenever it's loaded the first time be a way to simplify the programming of the AI in the long run?
I wonder if the AI would have gotten through this level quicker if it had learned to spin attack mid-air to delay landings.
I would love to see another video that goes into deeper detail about the algorithm and model. For exaple what model it is(a simple deep perceptron network?) when the output layer is(a softmax?) what is the graph thing on the right of the video(predicted reward as the output of the last layer?) if so, is the action chosen just the max of it? Also are you using ppo? And most important, how do you control the game, get the screen values, and get run the game alongside of your model(im assuming every tick of the game waits for the model to compute an action). Or did you dive into these details in a different video? That would be greatly appreciated!
I actually have a video talking about a bit more about my setup and the network called “the evolution of my Mario kart AI”. It’s about a different game, but the network is the same. At some point I might do something even more detailed though
Can you do a video on how to learn reinforcement learning like this? I want to create my own ANNs instead of using eg Tensorflow. I have looked at many resources on RL and it’s overwhelming I don’t know where to begin. Maybe a small intro tutorial series on the topic would be cool.
I absolutely love your videos!
I think a fun yet very time consuming idea would be to throw the ai into the game and let it try to finish the game without or at least minimum input. but i don't want to imagine how long that would take
i love the models getting even nuttier you're like the vedal of AI playing games and it's underappreciated
11:59 I see the cheeky little subtitle there
Now let’s see Galactus beat Champion’s Road.
This AI be pulling out the speedrunner strats
4:30 if you do an AI on Mario 64 where it takes advantage of its de facto speed I'll say that pannenkoek better watch out cuz Galactus will probably start building up so much speed that it'll start hopping QPUs before we know it
What are your (training) pc specs? great vid as always
I am running this on a desktop pc, with an rtx 4090, intel i9-14900k and 64gb ram
Damn, that intro was fire 🔥🔥🔥
Glad you liked it haha
this is just straight up cool
Glad you like it!
Surely the ai seeing colour would be useful to know where it’s going?
Potentially, however it does massively increase the amount of information the AI needs to learn, meaning the speed I could run it at would be much slower. Might be something I look to try though.
What's the name of the song at the beginning?
14:14 I mean this is basically how I play this part
Do you think that an extremely sophisticated AI with basically all of the possible moves in a game would be able to find a new glitch/exploit if it ran for way too long?
Try adding memory inputs, it starts off with 0 and depending on itself and other inputs it changes.
should really try some Tiny series language datasets and go back to the self-supervised q learning (meta's recent paper about A^* is a cool take on language search space methods). talk it through a game like pokemon.
Do you think you'll ever release one of these neural network setups?
I'm definitely going to be releasing AI learning algorithm since I'm hoping to publish it in a conference sometime this year
It would be really cool to see the ai attempt to learn to fight bosses
I don't know if this is possible (or if you tried this) but rather than having dying be a set value of -reward if it were lose all reward I feel like it would stop Galactus from trying to die
plus it kinda makes sense if you die you gotta restart, you lose all progress (unless you got a checkpoint but still)
but again I don't know if you did this or if it's even possible
This is really interesting. Makes me really want to try this as well with some other games. Is there any source code?
Hey Tango, just curious but what was the size of training data after tens of hours of training?
The AI experienced a total of 50 million frames of the game (these were each taken four frames apart though, meaning the AI played 200 million frames of the level)
Do you have source code for the environment?
Love seeing the learning progress from just coding, once they can see through eyes to make judgments, they will vastly improve... also, you cut the part where it finished the level lol.
I've wanted to see someone try one of these except the only reward function is "does this screen look familiar," rewarding it for finding novel areas in the game, and then training it against multiple games without alteration to see where it can and cannot make progress.
What you describe is called curiosity driven reinforcement learning. It is used in instances where the rewards are very sparsely distributed, and it helps encourage the algorithm to explore more aggressively.
@@PixlRainbow Know of any videos on this?
I've looked into these algorithms quite a lot... sadly in reality they are quite underwhelming. Not as in they don't work, but take a crazy amount of time to train. One of the first papers to use (Never Give Up) this tried to play some atari games, however trained it for 10 billion frames (that's not an exaggeration, its the exact number).
@d6sp There is one titled "Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks" by Robotic Systems Labs at ETH Zürich. It appears that the key is to limit the curiosity to just specific elements or traits within the environment to limit the search space. In the case of the robot, the position of the robot is ignored and only the position of the box is tracked. Care also has to be taken not to give too much reward to exploration, or the algorithm will simply spend all its time exploring. However while this helps keep the model from getting stuck, it still takes much longer to train a model this way compared to a hand-tuned reward that closely matches the task.
One day there will be an ai% speedrun where you have to program the best ai
I hope to be the first one making some entries in that category haha
Make it do the perfect run
Seems like its pretty similar to generative AI, the output quality is highly dependent on the human giving it the right prompts/rewards^^
i was so lost but the video was interesting :)
I wonder if it would be possible to get an ai to play portal
I'd take more of a Toriel approach, holding its proverbial hand with some human gameplay.
Title: "AI DESTROYS The Centre Of The Universe"
Video: The Center of the Universe destroys AI
Maybe I missed it in the video but why isn’t the twirl/spin jump allowed? Seems like it’d make the game a lot easier for the ai
I was still having some trouble with using motion controls for the AI when I set this up. I think I’ve got it now though so might do some motion stuff in future videos
The fact that we didn't get to see the AI beat the boss really triggers me.
Why did you cut it at the end?
It's not over until it's over. You should make an AI beat Bowser.
Nice!
Thanks!
Idk why people would be terrified about AI getting their job. Take my job. i got more time to do the things i like to do.
How do you spawn the AI in a random location?
I just went through the level and made savestates a a bunch of different points, then randomly choose one to spawn the AI at
obviously harder than it sounds, if only there was a reward just for exploring to encourage experimenting. I'm sure I'm way naive to the mysteries of the neural network a learning computer.
It exists, it's called Curiosity-Driven Learning/ Curiosity-Driven Exploration. A key difficulty with this approach is that you have to maintain a database of all previously explored states, and this database can become quite large and sluggish for complex worlds or control schemes.
make it fight king whomp
Make it play the whole game on stream
I would love to see an AI like this be fed a full resolution, live feed of the game. Of course this would require either a ludicrously powerful set up or a huge advancement in image reading speeds and image comprehension. It would so cool to have the AI be allowed to run for as long as possible going through the full game, with its end goal would probably be "Get to 100% as fast as possible" causing it to understand the game engine more deeply than we as humans currently do
Imagine the AI being given years of run time
You can sorta do this much more cheaply by first training a separate AI to compress the footage into a more meaning-dense form.
I wonder if this could be trained to play the whole game and even beat speedrunners times?
Amazing
Thanks
Is there somewhere I can view the code for this ?
Not yet, but one day. The AI learning algorithm however is something I built myself and am hoping to publish soon
@@aitango oh I was just wanting to look at it from a student point of view lol
I just realized that the AI can't spin
having 100% the game 3 times, I know the final level is no easy thing
I honestly forget how hard some of these games were when making AI for them
1:02 I thought you said calculus 2.0 lol
Love me some calculus haha
You forgot to beat bowser
I’m high right now and this is insane theres no way you made an ai that learned to kill itself because it was less painful than trying there gonna put you in the I have no mouth and I must scream chamber because you put them in the Mario galaxy lava torture chamber
if i'm being honest, I want to see the AI play reckless as they can create funny moments. Also, cheating the wall by using the fire bars to jump up wasn't really an issue but fun. It should have only been when the AI dies they lose reward (or don't get reward yet I don't know AI better than you so...)
You should go into the actual tools you're using.
A bit mean to punish speed running tricks
I would’ve let it continue, but it was mostly dying as a side product of the speed runner strats
Who the fuck is AL and how dod he get so good!?
Ai tango can you please teach the ai to play Mario kart 7 I would really appreciate it if you have the time ❤🙏
It would probably take a while to setup up, but if I can get in contact with some of the devs of a Nintendo switch emulator I could probably do a bunch of newer games
Can you make ai plays Splatoon?
Could it play call of duty zombies?
Guess you’ll have to wait and find out :)
Hydra 👍🏻
😊
1:26 "but instead of gobbling up candy, it's calculating rewards." LOL nerd.
Never heard an AI be referred to as a nerd before hahaha
Do you think AI could complete Rain World?
It's a very hard survival platformer few people even complete.
整理のためにチャンネル登録解除しときますね
:(
@@aitango sorry〜
Why say that though?
You didn't even show it beating the level. Wtf are you talking about?