Reinforcement Learning - My Algorithm vs State of the Art

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ม.ค. 2025

ความคิดเห็น • 336

  • @chris-graham
    @chris-graham 2 หลายเดือนก่อน +484

    I think you would be interested in network pruning. This is something that's typically done periodically during training to thin networks. If you examine the weights in your PPO-optimized network, you'll find that many are very small, while others are larger. If some near-zero weights are set to zero, networks will often become more stable after fine-tuning. You'll find that the connections in the network begin to look sparse and very similar to networks generated via. Evolutionary methods. PPO is just an optimizer and will work with whatever network configuration you want. The evolutionary networks shown in the video are all differentiable, so PPO would be able to optimize. That would be an interesting comparison if you'd want to pursue that!

    • @nodrance
      @nodrance 2 หลายเดือนก่อน +23

      i smell a part 4

    • @w花b
      @w花b 2 หลายเดือนก่อน +8

      ​@@nodrance I smell you smelling something

    • @Firestorm-tq7fy
      @Firestorm-tq7fy 2 หลายเดือนก่อน +5

      They only sparsen if you use regulation methods like L1

    • @chris-graham
      @chris-graham 2 หลายเดือนก่อน +5

      @@nodrance Do you smell it? That smell. The kind of smelly smell. The kind of smelly smell that smells... smelly.

    • @bitblit
      @bitblit 2 หลายเดือนก่อน +1

      @@chris-graham Right you are, Mr. Krabs.

  •  2 หลายเดือนก่อน +421

    triple pendulum next?

    • @dongyulee2095
      @dongyulee2095 2 หลายเดือนก่อน +4

      Impossible...

    • @sumitbiswas164
      @sumitbiswas164 2 หลายเดือนก่อน +2

      How to get the solution for dynamic (n) chain of pendulums? Is it possible now?

    • @alxklgn364
      @alxklgn364 2 หลายเดือนก่อน +13

      I think that I've read a paper explaining why triple pendulum is a total chaos and impossible to solve. But I would also like to see an attempt.

    • @elie_
      @elie_ 2 หลายเดือนก่อน +1

      @@dongyulee2095 "Source: lol"...
      th-cam.com/video/cyN-CRNrb3E/w-d-xo.html
      And all possible instable equilibrium states th-cam.com/video/I5GvwWKkBmg/w-d-xo.html

    • @elie_
      @elie_ 2 หลายเดือนก่อน +1

      @@dongyulee2095 "Source: lol"
      th-cam.com/video/cyN-CRNrb3E/w-d-xo.html (13 years ago)
      th-cam.com/video/meMWfva-Jio/w-d-xo.html
      th-cam.com/video/I5GvwWKkBmg/w-d-xo.html (even more impressive)

  • @imanuelbaca2468
    @imanuelbaca2468 2 หลายเดือนก่อน +18

    The quality and education of these videos is unmatched please keeping making stuff like this!

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +2

      Thank you :)

    • @Ibloop
      @Ibloop 17 วันที่ผ่านมา

      @@PezzzasWork 13:45 How intensive is stuff like this to run on a computer?

  • @fluffsquirrel
    @fluffsquirrel 2 หลายเดือนก่อน +41

    Thank you so much for this demonstration and adding the links! I didn't know of Isaac Lab and was wondering how it was possible to control the mechanics. Great video!

  • @Waffle_6
    @Waffle_6 2 หลายเดือนก่อน +93

    getting that sort of aid from NVIDIA is super nice. super cool, my school just got an ai accelerator, " AGX Orin" very cool piece of computing and fantastic of AI training and research. also, as someone who is more hardware orientated, it has a super fascinating architecture(shared cpu and gpu global memory!)

    • @meronamsamho
      @meronamsamho 2 หลายเดือนก่อน +4

      security be damned I want faster training!

    • @conorstewart2214
      @conorstewart2214 2 หลายเดือนก่อน

      They definitely are cool but I would not class the AGX orin as an AI accelerator, not in the same way GPUs are. Or at least not just an AI accelerator. The AGX Orin and the whole Jetson lineup is meant for embedding in things, like robots, cars, etc. It is a full system, CPU, RAM, GPU.
      It is also not very powerful for the cost, at least in terms of raw compute performance. Even a 4060 gets 242 TOPS whereas the AGX Orin only gets 275 TOPS. If you don't need the portability and embeddability of a Jetson system then you are far better just buying GPUs. I can get a 4060 for £250 (yes this is without a CPU and only has 8 GB VRAM) but the AGX orin costs £1992, so just going with desktop PC hardware your money goes much further. For the price of an AGX Orin you could likely build a 4080 or possibly 4090 PC and get much more performance. If RAM is that much of an issue then you should probably look at enterprise or data centre level systems.

    • @jibcot8541
      @jibcot8541 2 หลายเดือนก่อน

      When you are the most valuable company to have ever existed, I guess you can give a bit of money away to teachers and researches, still nice of them I guess.

    • @erinlucassen
      @erinlucassen หลายเดือนก่อน

      @@conorstewart2214 Agreed; for embedded applications nothing gives you the same TOPS/watt as a Jetson, but for training the 40XX series is most cost-effective (in our setup we use this to train end-to-end control for drones, and an Orin NX for deployment)

  • @kubstoff1418
    @kubstoff1418 2 หลายเดือนก่อน +12

    I've been looking for a subject for my engineering degree and this video might be exactly it! Thank you for the inspiration, your videos are always a blast!

  • @max_me_is
    @max_me_is 2 หลายเดือนก่อน +383

    We got Pezzza's work X Nvidea collab before GTA VI 😭

    • @Djellowman
      @Djellowman 2 หลายเดือนก่อน

      Shut up

    • @harriehausenman8623
      @harriehausenman8623 2 หลายเดือนก่อน +12

      I prefer *this* 😄

    • @CAGonRiv
      @CAGonRiv หลายเดือนก่อน

      Breh
      💀

  • @_nemo
    @_nemo 2 หลายเดือนก่อน +63

    17:06 That's so similar to what the timescales of evolution in nature, and a human learning a skill are like. That's kinda crazy. Really makes it look like the algorithms successfully mimic real counterparts.

    • @TheRealZitroX
      @TheRealZitroX 2 หลายเดือนก่อน

      And still, some Human doesn't learn at all.

    • @raspberryjam
      @raspberryjam 2 หลายเดือนก่อน +5

      @@TheRealZitroX mean

    • @0osk
      @0osk 2 หลายเดือนก่อน

      @@TheRealZitroX *some humans don't learn
      :)

  • @briandeanullery
    @briandeanullery 2 หลายเดือนก่อน +6

    This is just brilliant. I verbally gasped at those numbers. I am so grateful to be living in a world with this sort of stuff, it's truly amazing!

  • @PatrickHoodDaniel
    @PatrickHoodDaniel 2 หลายเดือนก่อน +7

    Oh my god, a video from Pezzza!! I'm so excited!!

  • @poketopa1234
    @poketopa1234 2 หลายเดือนก่อน +5

    PPO and gradient-based policy learning in general is amazing. I will still say that your struggle to get an evolutionary algorithm to learn this problem led to some really creative and impressive curriculum learning ideas which also apply to PPO :)

  • @drhxa
    @drhxa 2 หลายเดือนก่อน +14

    Have you considered adding physical parameters from motor torque and motor weight? This would help you get much more realistic sim and difficulty level. Also, realistic response times (based on inference speed + connection latency). Also, you can either have a motor at the base and one at the middle joint or both at the base.
    You may also consider adding a battery's weight, so you have the voltage required to power those two motors for some period (say 5 min). This will be an awesome challenge and help you connect simulation to reality much more closely, which sounds super exciting. Looking forward to see if you end up working on it!

    • @lorem9587
      @lorem9587 2 หลายเดือนก่อน +3

      I like these suggestions. Where are the two motors, though? I thought there was only one, the one driving the carriage.

    • @drhxa
      @drhxa 2 หลายเดือนก่อน

      @@lorem9587 oops, haha, you're absolutely right!

    • @drhxa
      @drhxa 2 หลายเดือนก่อน

      The hinges have to be free, that's the whole point of the control problem! My bad haha

  • @kiaranr
    @kiaranr หลายเดือนก่อน

    Instant like and sub. I could watch these all day. Great work!

  • @requestfx5585
    @requestfx5585 2 หลายเดือนก่อน +2

    Thanks for this high quality video and comparison of those algorithms, very nice. Keep it up

  • @Gabonidaz
    @Gabonidaz 2 หลายเดือนก่อน +10

    1:08 what are this dashboard? How did you builded? I need to try ...

    • @Afkmuds
      @Afkmuds หลายเดือนก่อน +1

      Agree

    • @revimfadli4666
      @revimfadli4666 6 วันที่ผ่านมา

      did you had builded?

    • @Gabonidaz
      @Gabonidaz 5 วันที่ผ่านมา

      @@revimfadli4666 no, but i have finded the library that he uses, its SFML, he has a video on his channel on how to start rendering with C++ and SFML.

  • @sutsuj6437
    @sutsuj6437 2 หลายเดือนก่อน +6

    Do note that Evolutionary algorithms are usually better than pure RL agents for problems with very sparse rewards (Which is not the case here). For these problems, a hybrid approach might work best.

  • @nexttonic6459
    @nexttonic6459 2 หลายเดือนก่อน +8

    Now you have to add flex to the materials, a small gap to the rollers and the beam. Then add a slack in the bearings...

    • @rcnhsuailsnyfiue2
      @rcnhsuailsnyfiue2 2 หลายเดือนก่อน +6

      Don’t forget to account for the acoustic energy of a squeaky pendulum hinge… And a gentle breeze from a robot farting nearby…

    • @nexttonic6459
      @nexttonic6459 2 หลายเดือนก่อน +6

      @@rcnhsuailsnyfiue2 Agreed. Nvidia talks about real physical world, yet farts and acoustic energy probably not accounted for.. though that is a simulation thing I don't think the video maker can effect that.

  • @J3R3MI6
    @J3R3MI6 หลายเดือนก่อน +1

    Please more videos like this 💎 this was so cool.

  • @conorstewart2214
    @conorstewart2214 2 หลายเดือนก่อน +1

    This is very impressive and makes me want to look into RL for robotics again.
    I really don't think you can make much comparison about network size though if you only tried one network configuration that you chose randomly. A followup video seeing how small you can make it would be very interesting. It would also be interesting to see you try and take it from simulation to real life.

  • @FIT7Y
    @FIT7Y 2 หลายเดือนก่อน +1

    I would love to see you tackle other kinds of equilibrium positions. Where one of the pendulums is up while the other is down. And maybe even efficiently switching between the different equilibriums.
    Something like Embedded Control Lab's videos about switching between the different equilibriums for a triple pendulum.

  • @FoXMaSteR001
    @FoXMaSteR001 2 หลายเดือนก่อน +3

    Aweome :D Try to apply to same method to penspinning learning, the fact the brain can coordinate all fingers to use the momentum of the pen in complexe figures is amazing, the duration you need to learn the tricks is probably linked to the touch sensation instead of the view of the figures, as a pro can perform tricks without watching his hand. With time the brain can adjust the position of the hand and fingers depending of the rotation of the pen to save the figure or to trigger a new one to smoothly, that's very automatic at some point. Using a different pen with another balance lead to faster adaptation once the person is pro, the only way to learn it is to try and error, which looks like this video. The movement tends to optimized with time, once you manage to make a trick with the pen it seems your brain remember what happened, which can help to do it again, when this happen it's like bicycling, you can spend a week to try with no success and suddently reach very high sucess rate in a few hours once you made it once. That's a very weird feeling.

  • @optozorax
    @optozorax 2 หลายเดือนก่อน +16

    I'm solving similiar task: I'm trying to learn AI car to drive, with realistic physics. And I was struggling with learning as you do in previous video, I was inspired by your solution and tried another approach: I started from simple physics (no inertia, no wheels, just rotations + offsets), then gradually interpolated between this simple physics and hard physics. And my NN was able to learn how to drive perfectly. But then I tried energy-based model, basically it's an NN that receives current state, desired action and outputs just a single number - energy. You need to find best action that outputs minimum energy. I iterated over 9 possible actions, and that NN was able to learn how to drive in complex physics without any hacks and very fast.
    So, what do I think: first try CMA-ES, as a superior zero-order optimization method. I think that NEAT is a trash, and one day I will test it out. Then you should try energy-based model. Then it will be someway fair comparison. Now it's not fair absolutely, and I slightly disappointed with this video.

    • @vastabyss6496
      @vastabyss6496 2 หลายเดือนก่อน

      what's the difference between energy and the loss? Also, your method sounds a lot like a DQN if I understand you correctly, and vanilla DQNs are much worse than PPO

    • @optozorax
      @optozorax 2 หลายเดือนก่อน

      @@vastabyss6496 energy is minimized during inference (to find best action for an agent), loss is being minimized during traiting. So, to train an energy based model you need to minimize energy in every step of a simulation, while minimizing overall loss. Many minimizations inside a big one.

  • @tomsterbg8130
    @tomsterbg8130 8 วันที่ผ่านมา

    17:30 reminds me of when Technoblade was like "Some people man, they go on bragging about how they killed Technoblade while it wasn't even a final kill!", it just gives such a similar, yet distinct energy haha!
    Great work man and keep going!

  • @smokeydude3
    @smokeydude3 2 หลายเดือนก่อน +56

    Why not try testing a more compact PPO network?

    • @miran248
      @miran248 2 หลายเดือนก่อน +13

      Silently hoping for a part 4 and a triple pendulum :)

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +38

      I tried but I couldn't manage to find a good solution (they were very unstable)

    • @stephaneduhamel7706
      @stephaneduhamel7706 2 หลายเดือนก่อน +8

      @@PezzzasWork Maybe you could try distilling the working network and see how small you can make it before it breaks?

    • @cagedgandalf3472
      @cagedgandalf3472 2 หลายเดือนก่อน +2

      @@PezzzasWork Try compacting only the actor network (and also lowering the learning rate) and keeping the critic network to default. That is what I did, although I use TD3 with auxiliary networks.

    • @vincentverbergy9816
      @vincentverbergy9816 2 หลายเดือนก่อน

      ​@@cagedgandalf3472 PPO is not an actor critic network? In general with RL size comes at the cost of computing time and risk for over fitting is not necessarily that big meaning that bigger network size isn't really a drawback given enough compute.

  • @waity5856
    @waity5856 2 หลายเดือนก่อน +3

    It's amazing to see it temorarily give up on balancing when it gets too close to the edge of the rail, so it can try again later in a more favorable position

  • @marcserraortega8772
    @marcserraortega8772 2 หลายเดือนก่อน

    Thenks a lot for the high quality video! I would love to see more videos related to RL in the future. Keep it up!

  • @tom-et-jerry
    @tom-et-jerry หลายเดือนก่อน

    This is the most fabulous video i have ever seen since a long time ! Evolutionary vs reinforcement learning waooooo i love it ! Please could you make more videos ???

  • @dottedboxguy
    @dottedboxguy 2 หลายเดือนก่อน +43

    well, sure it's only a few minutes of training, but just how much computational power (or just electricity) was used during these few minutes ? i think it's much much more than your simpler approach. it's cool, but it would be interesting to do a test with power usage normalization to do a fair efficiency comparison

    • @sirynka
      @sirynka 2 หลายเดือนก่อน +5

      Still, 8h of cpu time, even in single core mode, would consume around 40w, so 320wh total. A fully utilized 4090 - 450w * 5m = 40wh.
      Units were edited according to @somedudewillson.
      Thanks for the explanation.

    • @dottedboxguy
      @dottedboxguy 2 หลายเดือนก่อน +2

      @@sirynka what tells you it's a 4090 though ? as it stands, it seems more like a cloud GPU compute approach within a large GPU bay, which consume a tremendous amount of power, though only pezzza could confirm that

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +39

      I didn’t specify it in the video as I thought the difference in time was large enough. My algorithm consumes around 120wh for around 5 hours when the 4090 consumes around 150wh for 3 minutes. I agree that it would have been a nice addition in the comparison.

    • @dottedboxguy
      @dottedboxguy 2 หลายเดือนก่อน +2

      @@PezzzasWork thanks for the precision ! this is indeed good to know, and does change things around a little as to which solution is better, especially considering the resulting NN depths

    • @somdudewillson
      @somdudewillson 2 หลายเดือนก่อน +3

      @@sirynka A Watt is a rate of energy transfer - specifically a Joule per second. A 40W CPU does not consume 40 Joules per second per hour, on account of how that doesn't make sense in this context (If the rate of energy usage was changing it _would_ make sense as a unit, however).

  • @rcnhsuailsnyfiue2
    @rcnhsuailsnyfiue2 2 หลายเดือนก่อน +23

    Please consider a side quest to balance a double pendulum IRL?! 😱 You could (relatively) easily build a device for this with a single stepper motor, drive belt, and an arduino. Look at X/Y plotters like Axidraw, enthusiasts regularly build these things themselves with off-the-shelf parts. Hook the stepper motor up to your model, and you’ve got a scientific viral video just waiting to happen…

    • @firedeveloper
      @firedeveloper 2 หลายเดือนก่อน +4

      I would love to see it IRL but that's a serious task.
      1. There is a huge gap between model and real hardware.
      2. IRL you can't have x,y,z positions without camera. The most viable way would be with accelerometers and definitely rings with contacts for data transfer.
      Imagine how hard are some projects with a simple PID, this is 100x more difficult.

    • @rcnhsuailsnyfiue2
      @rcnhsuailsnyfiue2 2 หลายเดือนก่อน +3

      @@firedeveloper fair point, maybe not “easy”. I just think for a motivated novice it would certainly be achievable. If it were me I would use a rotational angle sensor on each pivot point, they’re very cheap and can be frictionless too. Mount it all on a sliding steel rail, pulled continuously along the long axis by a computer-controlled stepper motor. Then it’s simply a motion control system running in a feedback loop. Because the stepper motor is quantized, you can know the entire state of the system from just the 3 angle sensors.

    • @conorstewart2214
      @conorstewart2214 2 หลายเดือนก่อน +1

      ​@@firedeveloper it would need some way of sensing position but if they can make the simulated model as accurate as possible to the real one, including the sensor data the model is fed, they it should be possible for it to work in real life.

    • @rsflipflopsn
      @rsflipflopsn 2 หลายเดือนก่อน

      @@conorstewart2214 same thought. maybe balancing a double pendulum (so I mean a pendulum with two moveable axis, could be a triple pendulum? sorry I am not that familiar with the nomenclature of these in the field of physics) is possible if you have two really precise sensors at both axis which respond with their positions accordingly and really fast. the bigger challenge (if you do something like that with ML) could even be the response time of the model plus the call to the actuator (?).
      nevertheless I really like your thought!

    • @rcnhsuailsnyfiue2
      @rcnhsuailsnyfiue2 2 หลายเดือนก่อน

      @@conorstewart2214 there’s no need to sense position, only the angle of the pendulum. The position is inherently measured by simply knowing the history of commands given to the stepper motor. The same technique is used by 3D printers, as long as they start from the “home corner”, their position will be known to the computer by simple addition/subtraction.

  • @nodrance
    @nodrance 2 หลายเดือนก่อน +2

    I'd love it if you spent more time playing with this. Smaller network, triple pendulum, add random forces to the sim to increase stability, maybe make it target alternate configurations (for example first arm up second arm down or vice versa) and make it chooseable, make it not able to exert as much force. Really push it to the limits and see what it can accomplish

  • @R.B.
    @R.B. 2 หลายเดือนก่อน +2

    The next task is transitioning between states, of which there are four positions, both arms down, both arms up, and two positions with one arm up and the other down. After that you can move to three arms, where there are 8 states. At three arms you have a chaotic system, but this has been solved already with physical systems, so it would be interesting for a simulated system.

    • @jaiveersingh5538
      @jaiveersingh5538 2 หลายเดือนก่อน +1

      Isn't it already a chaotic system with just 2 pendulum arms?

  • @Rekklessss
    @Rekklessss 2 หลายเดือนก่อน +4

    How did you manage to create such a sleek looking dashboard for the model in the beginning of the video? 1:19

  • @Deniil2000
    @Deniil2000 2 หลายเดือนก่อน +1

    15:58 i really like how it knows not to chase the pendulum into the end of the rail, and makes a flip instead

  • @r.g.thesecond
    @r.g.thesecond 2 หลายเดือนก่อน +1

    11:20
    I'm a bit surprised. Is it not possible to use constraints or IK in Blender to also describe the joints, and export them as well?

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +1

      It is certainly possible but I am not very familiar with all these tools, for my use it was simpler to rig the model directly into Isaac Sim

  • @rafa_br34
    @rafa_br34 2 หลายเดือนก่อน +21

    Well done. However, I feel like the video was a bit rushed. Primarily because you didn't test other network sizes, which would have made it more fair for the evolutionary algorithm. It also makes me wonder if the network really "learned" how to balance the pendulum or if it just memorized how to do it in the weights.

    • @NaifAlqahtani
      @NaifAlqahtani 2 หลายเดือนก่อน +9

      Agreed. This video contained no real information. Just an ad and a benchmark of an algorithm on dissimilar hardware

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +23

      I didn't specify other architectures because I couln't manage to have a satisfactory solution with smaller networks. Since I am not an expert with PPO I prefered to only mentioned that it is probably possible in the video. Regarding the learning I think the fact that the solution was able to recover from any perturbation means that there is no overfitting here.

  • @conorstewart2214
    @conorstewart2214 4 วันที่ผ่านมา

    To get PPO to centre you could probably include the distance from the centre in the reward function but much lower than other rewards so that it will prioritise the pendulum but then also try and centre.

  • @gryphonvalorant
    @gryphonvalorant 2 หลายเดือนก่อน +2

    song name at 8:15?

  • @florianvanleeuwen6683
    @florianvanleeuwen6683 2 หลายเดือนก่อน

    Randomly seeing my physics lecture building on youtube, nice video :)

  • @StevenJAckerman
    @StevenJAckerman 2 หลายเดือนก่อน +2

    Very nice work. Thank you for sharing.

  • @99totof99
    @99totof99 หลายเดือนก่อน

    Great video ! Very clear explainations !

  • @Thk10188965
    @Thk10188965 2 หลายเดือนก่อน +1

    I wonder if you can use PPO to get a solution fast, then evolution to slim it down (by adding some cost per node/connection I assume)

  • @weak7897
    @weak7897 หลายเดือนก่อน +2

    Next step is a 3D double pendulum with ball joints and a base with 2 degrees of freedom

  • @BananaDude508
    @BananaDude508 2 หลายเดือนก่อน +2

    just did a school based research paper on machine learning and pendulums using your other videos as reference, this video wouldve been perfect if it was 2 months earlier lol
    Either way thanks!

  • @marcelob.5300
    @marcelob.5300 2 หลายเดือนก่อน +3

    Would it be possible to include in the description the hardware specs, please?

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +8

      I added them in the description

    • @marcelob.5300
      @marcelob.5300 2 หลายเดือนก่อน +5

      @@PezzzasWork thanks a lot!

  • @MarimeGui
    @MarimeGui 2 หลายเดือนก่อน +2

    Did this simulation include limits on acceleration to try to match real motors ?

  • @rudrajoshi674
    @rudrajoshi674 2 หลายเดือนก่อน

    How did you visualize the ann at 15:20

  • @steve_gatsis
    @steve_gatsis 2 หลายเดือนก่อน +1

    Us there a comparison on how "demanding" each method was in terms of computational resources and memory?
    What i mean is; after training, how much does your pc "struggle" to obtain the result it trained upon
    Do you think something like that matters in the end?

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน

      This is a tough question. On the one hand, PPO uses a much larger network than the evolutionary approach, but inference is performed on specialized hardware that is far more efficient for mass computation.

  • @harriehausenman8623
    @harriehausenman8623 2 หลายเดือนก่อน +1

    Beautiful and informative video! 🤗 So satisfying animations. thx 🙏

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +1

      Thanks :)

    • @harriehausenman8623
      @harriehausenman8623 2 หลายเดือนก่อน

      @@PezzzasWork Wow! that was quick 😄

    • @harriehausenman8623
      @harriehausenman8623 2 หลายเดือนก่อน

      @@PezzzasWork I only heard of Isaac before, but wasn't aware it's *that* powerful! 😲 I wouldn't mind a follow-up video where you show the things addressed in this comment section. Like how the smaller layer sizes failed (blooper-time!!) and stuff like that.

  • @phrozenwun
    @phrozenwun 2 หลายเดือนก่อน +1

    For the single pendulum, is it possible to move the inverted "upper" node to any horizontal position as fast as the driven node can move?

  • @Bluelightzero
    @Bluelightzero 2 หลายเดือนก่อน +3

    Is it possible to analyse what these neurons are doing?

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน

      Probably, I don't know how though :D

  • @realzakariax
    @realzakariax 2 หลายเดือนก่อน +1

    7:50 I love how the base also returns to the middle of the field, so fascinating!

  • @byzantagaming648
    @byzantagaming648 หลายเดือนก่อน

    What is the interest of Reinforcement Learning compared to Optimal Control? My guess would be that with optimal control you could directly obtained the optimal movements without the need a costly training.

  • @deniskhafizov6827
    @deniskhafizov6827 หลายเดือนก่อน

    In comparison with my own distant memories of computing liquid dynamics in Pascal on a 386sx, what I see people have now brings me tears of mixed joy, awe and envy.
    With a little horror of profligacy.

  • @untyperdm
    @untyperdm หลายเดือนก่อน

    and what about all the deterministic model based control that work pretty well ? May be cool to compare !

  • @Build_the_Future
    @Build_the_Future 2 หลายเดือนก่อน +1

    Can You do more with Issac Lab I always run into problems when using it.

  • @thomas_c
    @thomas_c 2 หลายเดือนก่อน +2

    Amazing job ! I'm in love with PPO now :)
    What hardware did you need to train your ai ?

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +3

      Thank you :) I added the PC spec in the description.

  • @Aeorthian
    @Aeorthian 2 หลายเดือนก่อน +1

    In this simulation you mention you need both the position and the velocity of each joint. Your model does not appear to have any rotary encoders modeled on it unless you have a point mass added to represent it that we can't see? You would have to retrain this if you actually wanted to use this in the real world as it would require rotary encoders to measure the angular velocity/position, no? Also, does your bottom motor have a rotary encoder built into it or does it also lack a rotary encoder? Still a great job with the proof of concept even if it's not actually usable in real life.

  • @louisdupont2126
    @louisdupont2126 2 หลายเดือนก่อน +1

    Great video man ! Is it possible to share your code you really motivated me to dive deeper into isaac lab !

  • @Jiorgos3D
    @Jiorgos3D 2 หลายเดือนก่อน +3

    Yay! New Video

  • @peterzsoldos3551
    @peterzsoldos3551 4 วันที่ผ่านมา

    I followed your instructions for the 1 joint pole learning task but only changing the reset parameters seems to lead to no movement at all for me with PPO, is there something else I should change?

  • @IsaiahSugar
    @IsaiahSugar 2 หลายเดือนก่อน

    would love to see you implement ppo yourself! i think that as a viewer i would learn a lot more from that

  • @goatknight777
    @goatknight777 2 หลายเดือนก่อน +1

    PPO really is incredible in all ways

  • @lMINERl
    @lMINERl 2 หลายเดือนก่อน +2

    Love your work im a big fan XD

  • @_ingoknito
    @_ingoknito หลายเดือนก่อน +1

    great ad! - takes teh fun out of the simplicity imho.

  • @wfpnknw32
    @wfpnknw32 หลายเดือนก่อน

    very interesting! Could you gain similar performance as ppo with a larger starting network for your evolutionary approach, so it's closer to ppo's starting point?

  • @2001herne
    @2001herne 12 วันที่ผ่านมา

    The only issue I have with this video - and it's a great video - is the phrase "nVidia helped me out with some hardware". This makes it seem like the real solution to reducing training time was just "Throw more compute at the problem" which appears to be the opposite of efficiency. I'd be interested in seeing how well PPO operated under the same simulation conditions and execution hardware as the previous tests. I'm pretty sure that the resulting network would be quicker to run, as conventional networks can take advantage of SIMD and paralellisation more effectively that NEAT generated networks, but I almost wonder if due to the way that NEAT only includes complexity when it is required a NEAT network might be quicker to train?

  • @rakshitx1
    @rakshitx1 หลายเดือนก่อน

    which are the two previous videos?

  • @luke.perkin.online
    @luke.perkin.online 2 หลายเดือนก่อน

    Great video, can you do evolutionary distillation or pruning of the ~65536 parameter ppo model?

  • @JinKee
    @JinKee 2 หลายเดือนก่อน

    Omniverse kept on popping up on startup for me and then i couldn’t suppress or uninstall it when i was wanting to focus on other tasks and i think i broke my install.

  • @felixconrad9248
    @felixconrad9248 2 หลายเดือนก่อน +1

    great video as always, i am not excited to see a video by a lot of youtubers but you are surely one of them

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +1

      Thank you :)

  • @galacticlava1475
    @galacticlava1475 2 หลายเดือนก่อน +1

    Can you please post your code in the description? We’d love to tinker with it.

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน

      I will make my fork of Isaac Lab public soon and add the model featured in the video

    • @galacticlava1475
      @galacticlava1475 2 หลายเดือนก่อน

      @@PezzzasWork Thanks! Your AI content is some of the best on youtube rn. And I really commend people like you who keeps code open source so that we can all learn together.

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน

      @@galacticlava1475 thank you !

  • @amirhm6459
    @amirhm6459 12 วันที่ผ่านมา

    I never thought the robotic advancements is reached this level already

  • @ofabiolima
    @ofabiolima 2 หลายเดือนก่อน

    Please make a prediction of three body problem

  • @alejandromartinez-vp4sx
    @alejandromartinez-vp4sx 2 หลายเดือนก่อน

    Beautiful as usual.

  • @LinkLaine
    @LinkLaine 2 หลายเดือนก่อน

    if you expand the problem to full 3d where you have cart on a 2d surface and the pendulum that can fall in 3d will that algorithm be so effective as in 2d?

  • @thor9000
    @thor9000 2 หลายเดือนก่อน

    Super nice video and explanation! Question, how much did you need to tune the reward, and how essential are the rewards with the low weights?

  • @Markus-r6g
    @Markus-r6g 2 หลายเดือนก่อน +1

    6:57 the "simple task" is the limit of humans because a double makes it impossible for a human to accomplish

  • @lefm_
    @lefm_ 2 หลายเดือนก่อน

    Yeah i came from home building a small evo AI class in c# using maybe 6 nodes, stumbled upon ml-agents where solutions involves 256, often 512 nodes. It looks like a need for PPO.

  • @yannsadowski8292
    @yannsadowski8292 2 หลายเดือนก่อน +1

    Hi, you say it take you 5 hour with the Evolutionary tech. But with the RTX 4090 or another graphic card ?

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +1

      The evolutionary algorithm isn't GPU accelerated, it is running on the CPU (mutlithreaded). A big advantage of most of RL algorithms is that they are able to run on GPUs very efficiently.

  • @sucim
    @sucim 2 หลายเดือนก่อน

    You might be interested in looking into RLtools / the "Learning to Fly in Seconds" paper!

  • @EricSundquistKC
    @EricSundquistKC 2 หลายเดือนก่อน

    That is seriously impressive!

  • @alesegdia
    @alesegdia หลายเดือนก่อน

    Hey awesome work! What do you use for the pendulum visuals and stats? They look beautiful

  • @michael_pio
    @michael_pio 2 หลายเดือนก่อน +1

    Great informative video

  • @devsquaaa
    @devsquaaa 2 หลายเดือนก่อน

    Love the content. Please keep it up.

  • @UonBoat
    @UonBoat 2 หลายเดือนก่อน

    Such a smooth live chart system in the initial part of the video. Does this came from certain library or you wrote it yourself? Thanks.

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน

      Thank you :) It is a tool I wrote myself, I plan on doing a tutorial on the subject.

    • @UonBoat
      @UonBoat 2 หลายเดือนก่อน

      @@PezzzasWork Sounds cool! I know it's a bit off topic, but I’m looking forward to it whenever it’s out.

  • @n.lu.x
    @n.lu.x 2 หลายเดือนก่อน

    did you by any chance try the OpenAI-ES algorithm from their 2017 paper? its quite simple yet powerful for (larger) neural networks. + you could also run it on the gpu in parallel

  • @ErickTakada
    @ErickTakada 2 หลายเดือนก่อน

    Got a question: what if you add a slower reaction time? Like a human handling the pendulum?

  • @RaaynML
    @RaaynML 2 หลายเดือนก่อน

    Seems like you can actually get surprisingly close with an order of magnitude less params if you are willing to train longer

  • @wordhydrogen
    @wordhydrogen หลายเดือนก่อน

    Hello, what do you use for simulating the cart pole and the neural network? It looks really good

  • @gpjedy7379
    @gpjedy7379 2 หลายเดือนก่อน

    Sick! Will you do videos on training multi-agent tasks?

  • @mateosanpedro9578
    @mateosanpedro9578 หลายเดือนก่อน

    Awesome video! Is it possible to have a link for the usd file?

  • @narpwa
    @narpwa 2 หลายเดือนก่อน

    WTF IT LOOKS SOOO CLEAN HOW DID YOU DO VISUALS LIKE THAT ???

  • @expired___milk
    @expired___milk 2 หลายเดือนก่อน

    Could you use a big network using PPO and then make it smaller using the evolutionary algorithm?

  • @cloudzero2049
    @cloudzero2049 2 หลายเดือนก่อน

    Any possibility of comparing TD3 (Twin Delayed Deep Deterministic Policy Gradient) to PPO for this? I'm curious because I am working with TD3. It's a little more complex than PPO from what I understand, and maybe overkill for this project if that holds true, but I was just curious.

  • @Nothingguy562
    @Nothingguy562 2 หลายเดือนก่อน

    Hey
    I am very starstruck by your work
    I would be very grateful if u could tell me about how you learnt all of this.What would you recommend to a total beginner.
    Thanks

  • @FlashTheMusik
    @FlashTheMusik 2 หลายเดือนก่อน

    How do you make these awesome dashboards for your visualization?

    • @PezzzasWork
      @PezzzasWork  2 หลายเดือนก่อน +1

      I am using a tool I wrote myself

  • @biobuu4118
    @biobuu4118 2 หลายเดือนก่อน

    Very impressive but what would happen if the task was not to balance the poles but to aim let's say 90° angle at their joint ?

    • @minimon796
      @minimon796 หลายเดือนก่อน

      It is not an equilibrium

    • @biobuu4118
      @biobuu4118 หลายเดือนก่อน

      @minimon796 yes so that the AI cannot aim at a fixed unique state of the system but rather has to come up with it's own solution of oscillating between two (or more?) position of 90° poles. That would be really cool to visualize 4000 agents trying to make something out of a stupid rule given to a rather simple system like this one

  • @ArnaudMEURET
    @ArnaudMEURET 2 หลายเดือนก่อน

    The cart motor seems extremely (unrealistically) capable. I wonder how the network would react with a more reasonable responsiveness of the cart.

  • @afdf96
    @afdf96 2 หลายเดือนก่อน

    What about n-pendulum with n > 2? Or n-body problem?

  • @GelloMello-j9z
    @GelloMello-j9z 2 หลายเดือนก่อน

    woah....the graphical interface is so gooood