AI Plays Trackmania - Map5 2:04:91

Linesight

มุมมอง 8 900

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 26 ก.ย. 2024
The AI is trained via reinforcement learning.
Game: Trackmania Nations Forever (TMNF)
Map: tmnf.exchange/...
Replay (.gbx file): drive.google.c...

ความคิดเห็น • 66

@lordnoom4919 ปีที่แล้ว ⁺⁴¹
nice that it even figured out a rammstein hit to start a drift. Good work right here
@wazthatme ปีที่แล้ว ⁺⁴
This makes me so happy to see I could watch AI learning to play games all day
@m.i.c.h.o ปีที่แล้ว ⁺⁷
It could, in fact, neo slide.
Writual.
@gugus8081 ปีที่แล้ว ⁺⁴
This is impressive, I'm not even sure I can beat that RTA... Keep it up !
@exlpt2234 ปีที่แล้ว ⁺⁴
This is insane, great work!
@linesight-rl ปีที่แล้ว ⁺³
Thanks a lot!
@okty8372 ปีที่แล้ว ⁺⁴
is the AI able to generalize it's "driving skills" to other maps ? Amazing work btw ! (i m really interested in IA and love TM, so it's perfect content for me :) )
@linesight-rl ปีที่แล้ว ⁺⁴
We'll find out soon enough :)
@Metcoler ปีที่แล้ว
Nice work! It had to take a lot of effort. It is very impressive, that car can initialize a drift and drives very close to walls, and hit apexes like nothing. Big respect for this piece of work. Keep it up!
@linesight-rl ปีที่แล้ว
Thanks a lot! It has indeed been a lot of work, and we're still working on it! Next steps includetraining on more varied and less boring maps. We'll post progress videos along the way 🙂
@heavysaur149 ปีที่แล้ว ⁺⁶
I wonder what inputs are put in ? Is it based of field vision (like what is sees via the camera) or is it based of coordinates (it already knows all the map and can view his position on it) ?
And do you input what it outputs the frame before ? (like to know if he continues his drift or not)
And speed ? rotation of the car + where it goes (to know if he drifts) ?
I have so many questions
@linesight-rl ปีที่แล้ว ⁺¹¹
Inputs contain a screenshot of what is displayed by the game, the relative position of a few checkpoints on the centerline of the circuit in front of the car, the agent's previous action, the car's speed, and the direction of the gravity vector.
@Arcsinx ปีที่แล้ว ⁺¹
Crazy ! Yosh should see this
@gaiekkurvanov1841 ปีที่แล้ว ⁺⁶
Which algorithm is used ?
@linesight-rl ปีที่แล้ว ⁺⁵
This is value-based reinforcement learning.
We use a mixture of Implicit Quantile Networks, with N-steps and dueling networks. We also implemented Prioritized Experience Replay, Persistent Advantage Learning, Noisy layers for exploration and Quantile options (QUOTA), but those bricks are currently not used.
@Linck192 ปีที่แล้ว ⁺¹
Why did you make the AI output these groups of inputs instead of 4 values, one for each direction?
@linesight-rl ปีที่แล้ว ⁺¹
This is a requirement of DQN-like methods : each action is associated with a single value, and you pick the action with the highest value. DQN does not handle picking multiple actions at the same time.
@jorishenger1240 ปีที่แล้ว ⁺⁵
What if you let this AI loose on E02
@linesight-rl ปีที่แล้ว ⁺¹
I guess we'll have to try :)
@jorishenger1240 ปีที่แล้ว ⁺¹
@@linesight-rl would be amazing to see, would the small jumps be a problem?
@linesight-rl ปีที่แล้ว ⁺²
@@jorishenger1240 We're currently testing on more complex maps. Neither jumps, slopes, borderless roads seem to be a problem.
@jorishenger1240 ปีที่แล้ว ⁺²
@@linesight-rl amazing to see that tech has come so far that this is done by a person, not even a company or smth. So cool
@corbanizer7376 ปีที่แล้ว
Keep on going dude. This is sick
@Gryffins90 ปีที่แล้ว ⁺¹
Excellent project that I always wanted to try myself. I've seen your other response to the other comment asking for contribution. I'm also interested in helping (data scientist myself) so get in touch if you're willing to extend the team. I've a 2080ti available at home.
One suggestion for future video is to also show the keyboard input (only the 4 keys) in addition to the tree of input as it is more similar to how human display their inputs.
@lucacu3587 ปีที่แล้ว ⁺²
any wirtual vid watchers here???
@user-dh8oi2mk4f ปีที่แล้ว ⁺¹
Yes
@Stunde0Null0 ปีที่แล้ว ⁺⁴
Wirtual taking a L. kekw
@livingroom5899 ปีที่แล้ว
Better than I will ever be.
@PassiveIZ ปีที่แล้ว
Thats crazy after just 2700 runs and 30hrs
@eddyreising6567 ปีที่แล้ว
very impressive work!
@pixelmalfunction1772 11 หลายเดือนก่อน
is ur ai on the leaderboard the 1 with the sub 1:50 cause that would be impressive if it found cut and if it didnt then i beat the ai by 2 sec but it prob did
@vjproject ปีที่แล้ว
Porque no se ve completamente las marcas de neumaticos? Modificado o baja calidad😅
@rFey ปีที่แล้ว ⁺¹
Idk if this would be possible but i would love to see another angle to take ML/AI with trackmania. Feed it thousands of TASes or WRs on a bunch of maps with lots of different turns, block combinations, drifts whatnot and then see if it can get good times on real maps. My layman brain sees this as way more complicated so it probably is but yknow a man can dream
@linesight-rl ปีที่แล้ว ⁺⁴
What you are describing is called "supervised learning" where an AI is fed expert information and tries to reproduce the behavior of that expert.
In this video, we use another technique called "reinforcement learning" where the AI does not need to receive good runs, it is able to learn alone.
Supervised learning is generally easier, but has the drawbacks that it requires huge amounts of replays and that it will never become better than the expert it tries to mimic.
Reinforcement learning may be more difficult, but it can theoretically find strategies that were never shown to him.
@rFey ปีที่แล้ว
@@linesight-rl My idea was to use the information from supervised learning on random maps the AI hasn't "seen" but then i realized that wouldn't work if you couldn't also feed it block information or make some wild machine vision solution 🤔
@RadiantDarkBlaze ปีที่แล้ว
@@linesight-rl Is it possible to do something like starting a training run for a map as supervised learning, then switching the same training run to reinforcement once it reaches a certain fitness on the supervised part; so that it can surpass the player who provided the replays for that map for the supervised part as it goes about the reinforcement part?
@ryans3979 ปีที่แล้ว ⁺¹
@@RadiantDarkBlaze The idea you have does exist, it's typically called pre-training or sometimes bootstrapping. It's where you train a model with one method (so supervised learning could work), and then it has somewhat of a baseline behavior. In the case of supervised learning it might learn how to imitate some of the various tech that TASs use. Then, you can further train it using a different method to allow it to refine itself and improve past its current level.
The issues with that strategy are that, like linesight mentioned, you'd have to feed it a massive amount of replays. It's likely you don't have thousands upon thousands of TAS runs for a single map, so you'll need to feed it random TAS runs of other maps. If you do that, you have to deal with negative transfer, where what the tech and skills it learns from other maps might interfere, you don't want it trying to use glitches that are impossible or useless on a simple map like this. It's harder to make a generalized AI than it is a specific AI, and that's what you'd be doing with the supervised learning. That's a broader task than this AI which is just running on a very simple map. It could work in theory though, it's just more time consuming and more computationally expensive to implement.
@RadiantDarkBlaze ปีที่แล้ว
@@ryans3979 Would something like taking a single good human replay, and putting it through the brute-forcer tool while saving every single tiny improvement to eventually gather 10k+ technically-unique replays work for generating a supervised learning set for a map? Or is there express reason 10k+ (human or TAS) replays of a specific map are needed? I do think it's necessary to only train a specific net on a single specific track, I was never thinking the idea could be used for making a generalized all-rounder net.
@OPEK. ปีที่แล้ว ⁺¹
I’m interested to see how it handles random ramsteins and landing bugs tbh
@pinipilla ปีที่แล้ว
Those are not random, trackmania physics are deterministic its just changes too much with a little input change, which is not a problem for a machine
@barakeel ปีที่แล้ว ⁺¹
What was the reward when it was not able to finish the track yet?
@linesight-rl ปีที่แล้ว ⁺²
Simple question, simple answer: nothing. Neither a reward nor a punishment.
This will likely trigger the question "what's the reward then ?". It's mostly progress along the track.
I think we'll start to add voice-overs or some explanations in the next videos, look out for them :)
@curcodes ปีที่แล้ว
really good work
@curcodes ปีที่แล้ว
I got a challenge: next time, test your future AI on this. And see if reach 2.04 in
@ibozz9187 ปีที่แล้ว ⁺²
Are those neoslides or normal drifts?
@lordnoom4919 ปีที่แล้ว
looks to me like most are release drifts
@fontur5119 ปีที่แล้ว
most of them are neoslides
@pekatour ปีที่แล้ว
@@lordnoom4919 Aka neo drift
@lordnoom4919 ปีที่แล้ว
@@pekatour nope u dont need to release during a neo. Since neo = steering --> stop steering --> start braking --> steer again. All while holding down acceleration
@pekatour ปีที่แล้ว
@@lordnoom4919 mb
@xtraz9814 ปีที่แล้ว
Hello people from Wirtual videos
@Sagosmurfen ปีที่แล้ว
Neo slide god!! 😮
@11DowningStreet ปีที่แล้ว
how does this work? it looks really cool
@201pulse ปีที่แล้ว
Hi linesight I'm an experienced data scientist and I would be interested in helping and contributing to this project. Some time I actually wanted to do the same so I might have some cool ideas. Are you interested?
@linesight-rl ปีที่แล้ว
Hi, thank you for your interest. While it is always helpful to have another person's perspective, this is a rapidly evolving 2-person project. At least in the short term, we prefer to keep it small.
We will probably have a more open approach in the future and welcome contributions. You're welcome to ask again in a few videos' time!
@linesight-rl ปีที่แล้ว
How should we contact you when we are more open to contributions?
@ArrakisMusicOfficial ปีที่แล้ว
What GPU? :)
@linesight-rl ปีที่แล้ว ⁺¹
Nvidia 3060
@ArrakisMusicOfficial ปีที่แล้ว ⁺¹
@@linesight-rl How did you manage to get it learn so quickly? 2900 runs is ridiculously low amount for how good it got. You must have used very good priors, how did you do it? Careful reward modelling? Or really good initial policy? Or really good exploration policy? What RL method did you use? :)
@zillion8954 ปีที่แล้ว
now train it on a acual map
@Queen_Elizabeth249 ปีที่แล้ว ⁺¹
I wonder if KarjeN could defeat this AI
@КириллКалейс-ъ6ч ปีที่แล้ว ⁺¹
At first i thought human would be faster, but the length of the map...
@Queen_Elizabeth249 ปีที่แล้ว
@@КириллКалейс-ъ6ч true
@mk-ej3cz ปีที่แล้ว ⁺¹
For sure he could
@hayabusa10055 ปีที่แล้ว
@@mk-ej3cz as of now yes easily, but there's no telling how far it can be pushed maybe even to the point that AI makes TAS runs itself without your help
@ozzehh ปีที่แล้ว
wirtual sucks compared

ต่อไป

เล่นอัตโนมัติ

Trackmania AI Learns To Drift and Beat Pros ? | Hockolicious