How I Made the Best AI Snake

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 ต.ค. 2024

ความคิดเห็น • 36

  • @bikramjeetdasgupta
    @bikramjeetdasgupta 7 หลายเดือนก่อน +8

    Bro I reaallyy thoughht I was watcing a youtuber with atleast 300K subs... This is gonna surely blow up if the youtube algorithm supports you.. keep it up👌

    • @IDontModWTFz
      @IDontModWTFz 7 หลายเดือนก่อน +2

      I didn't even notice the subs... That's mental!

  • @TwoPointCode
    @TwoPointCode  4 หลายเดือนก่อน +3

    Something I want to address due to some comments:
    While working on this project I watched both Code Bullet's and AlphaPhoenix's videos. Code Bullet first tried to create an AI using Q learning but was running into many issues with the snake not eating the apple, so he decided to turn this problem into a searching/maze-finding problem. He then began only using searching algorithms and had the snake follow these paths. Because he turned to pathfinding, I don't consider the snake he made to use AI and instead considered it to fall under the category of automated snake games. For example, he brings up an interesting use case of maze solving in his video, GPS. Even though a GPS gives you directions, most people wouldn't consider it an AI. I see it the same way with the snake he made. The searching algorithm gives directions, and the snake automatically follows this path, hence why I consider it automated.
    In AlphaPhoenix's video, he also approaches this problem as a pathfinding problem. At 2:50 in his video, he states that "[His] goal is to make an algorithm that fills the board every single game in as few moves as possible." The way he solves this problem is in a brute force way where when a problem pops up he creates more conditions and rules for that specific case. The final solution he came up with is very impressive, but again I don't consider the snake he made to be an AI as it did not learn and correct from past experiences to make a certain decision, it instead was made to make certain moves based on different rules or conditions, because of this I believe it should fall under the category of automated snake games.
    Both of the videos mentioned above are very interesting and were very helpful in giving ideas and insight into what would and wouldn't work, but the final solutions to this problem were very different.

  • @ramsprojects8375
    @ramsprojects8375 4 หลายเดือนก่อน +3

    NIce Contents!. Keep it up. Please enable subtitles for users like me preferring subtitled videos.

    • @TwoPointCode
      @TwoPointCode  4 หลายเดือนก่อน +1

      Thank you for the compliment and the tip! I didn’t realize the subtitles were disabled. I just enabled the auto generated ones and will go through them soon to make sure they’re accurate.

  • @BeanDev1
    @BeanDev1 7 หลายเดือนก่อน +4

    Great work

  • @Rasil1
    @Rasil1 7 หลายเดือนก่อน +1

    nice one man this vid is in the algorithm thats why i was recommended this

  • @Me-0063
    @Me-0063 2 หลายเดือนก่อน +1

    I thought for a second about reward systems and came up with this:
    1) Reward for eating the apple
    2) Reward based on distance to the apple
    3) Small punishment for each move
    4) Bigger punishment for each move past the minimum number of moves it would take the snake to eat the apple. Could possible grow the punishment exponentially
    5) Punishment for death. Should always be the biggest number
    This prompts the snake to get closer and eat apples as fast as it can, acting a bit like a pathfinding algorithm thanks to point 4, whilst not compromising how long the snake lasts. I understand that this might not work if the AI is too shortsighted, preferring small rewards now instead of big rewards later, but I think there was a way to counteract that.

  • @sergioaugustoangelini9326
    @sergioaugustoangelini9326 7 หลายเดือนก่อน +1

    That's a wonderful solution! Great job!

  • @sotofpv
    @sotofpv 7 หลายเดือนก่อน

    Amazing, so happy to have found you, looking forward to your future videos given how well done your first video is :)

  • @rodrigorila1605
    @rodrigorila1605 7 หลายเดือนก่อน +3

    really cool AI

  • @ArminPaslar
    @ArminPaslar 7 หลายเดือนก่อน +1

    You deserve so much more

  • @krysdoran
    @krysdoran 7 หลายเดือนก่อน

    You really explained this in a simple, comprehensible way!!

  • @maxencehm1764
    @maxencehm1764 2 หลายเดือนก่อน +2

    hi the video was great ,hope youwill make other, I have different question :
    - how the snakes are evolving because the only way I know is by doing a genetic algorithm but here there is only one snake,
    - how did the snake choose to take a direction
    - what did you use to make the gaphics

    • @TwoPointCode
      @TwoPointCode  หลายเดือนก่อน

      Thank you!
      The reinforcement learning algorithm I used is PPO. PPO has two networks. One network is called the value network and it predicts the sum of future rewards that will be earned from an observation. The other is called the policy network and it outputs probabilities for each move given an observation. So, in this case, probabilities for the moves up, down, left, and right are outputted from the policy network. While training the AI the current observation, selected move, reward, and updated observation after the move are saved. The value network is then used to predict the future rewards of both saved observations. If the predicted new observation’s future reward plus the reward gained for the action is less than the predicted future reward of the original observation, then the action must not have been the best, so the probability of selecting that action given that observation is decreased. If it’s larger, then it was a good move and the probability for that move increases. If it’s the same, then it stays the same. This is normally done after a set number of steps and in batches of steps to avoid large, unusual updates. This is all repeated until you stop the training.
      After training, the policy network is given an observation and the action with the highest probability is used.
      For the graphics, while showing the training progress I used OpenCV in python, but, towards the end of the video when I was showing the final model, I actually created an in browser environment using JavaScript, CSS, and HTML.
      I tried to keep this comment a reasonable length while keeping the important information in it so I didn’t want to go too far into the details. If you have any questions, feel free to ask!

  • @desredes519
    @desredes519 7 หลายเดือนก่อน +3

    Great video

  • @gchd1232
    @gchd1232 7 หลายเดือนก่อน +2

    Love the video! Could you teach an AI how to play something like billiard next time?
    Also: How long did it take to program this AI without the training part?

    • @TwoPointCode
      @TwoPointCode  7 หลายเดือนก่อน +2

      Thank you! It would be interesting to see two AIs learn to play billiards against each other….. I’ll have to look into it. If I were to strictly talk about the time I spent programming this, it would be multiple days worth of straight coding, but the final code is actually quite small. The reason it took so long is because of the lack of information around this and the amount of trail and error I had to go through to get this AI to the level it’s at. If I include the time I’ve spent training different models and testing different things then I’ll tell you that I started this project mid November, so 4 months straight. Again, the reason for that is because of how long it takes to train new models.

  • @Peledcoc
    @Peledcoc 7 หลายเดือนก่อน +1

    Great video!

  • @palmero7606
    @palmero7606 7 หลายเดือนก่อน +1

    Amazing Video.💪🏼

  • @sotofpv
    @sotofpv 7 หลายเดือนก่อน +2

    Oooh just thought of a genuine question. When you change the grid size, are you adding more inputs to the network? Needing to retrain it?

    • @sotofpv
      @sotofpv 7 หลายเดือนก่อน

      I think you answered my question at around minute 25:20 hehe

    • @TwoPointCode
      @TwoPointCode  7 หลายเดือนก่อน

      Ya, good question! That final snake I showed with the differing grid size was actually a single model that was trained on a 20x20 input. The observation space was basically the same, but during training a random board size under 20x20 was picked each game and 1’s were used to fill the board to make the board appear the correct random size.

  • @stevencowmeat
    @stevencowmeat 7 หลายเดือนก่อน

    Awe man went to find another vid by you this is the only one. RIP hope you continue making vids. Solid music choice as well. Also is your logo generated by Dalle? Idc if it is just looks like its style.

  • @lucasgaperez
    @lucasgaperez 7 หลายเดือนก่อน +4

    3 subs whatthe fuck

  • @petr-heinz
    @petr-heinz 7 หลายเดือนก่อน +1

    How can the AI know when its gonna get punished if it doesnt get previous frames as input? Does it have a dedicated counter input for the limit?

    • @TwoPointCode
      @TwoPointCode  7 หลายเดือนก่อน +1

      Good question! Some things I had to brush over/simplify for the sake of the video and that was one of the things I decided to leave out. In the observation space the AI is also given a value of how many remaining moves it has and once that value hits 0 the game will end and it will be punished for running out of moves. This is done so the AI can learn why it is being punished and learn to avoid that value reaching 0.

  • @mafiawerbung
    @mafiawerbung 7 หลายเดือนก่อน +2

    64x64 when?

    • @TwoPointCode
      @TwoPointCode  7 หลายเดือนก่อน +1

      Maybe….. but it would be quite some time from now…..

  • @gabboman92
    @gabboman92 7 หลายเดือนก่อน

    hey do you have a fedi presence? mastodon n stuff

    • @TwoPointCode
      @TwoPointCode  7 หลายเดือนก่อน

      No, I don't at the moment

  • @montageofchips
    @montageofchips 4 หลายเดือนก่อน +1

    A little too slow, but overall a good video

  • @Me-0063
    @Me-0063 2 หลายเดือนก่อน +1

    I thought for a second about reward systems and came up with this:
    1) Reward for eating the apple
    2) Reward based on distance to the apple
    3) Small punishment for each move
    4) Bigger punishment for each move past the minimum number of moves it would take the snake to eat the apple. Could possible grow the punishment exponentially
    5) Punishment for death. Should always be the biggest number
    This prompts the snake to get closer and eat apples as fast as it can, acting a bit like a pathfinding algorithm thanks to point 4, whilst not compromising how long the snake lasts. I understand that this might not work if the AI is too shortsighted, preferring small rewards now instead of big rewards later, but I think there was a way to counteract that.