Controlling Drones with AI (Python Reinforcement Learning Quadcopter)

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 มิ.ย. 2024
  • Teaching a Reinforcement Learning agent to pilot a quadcopter and navigate waypoints using careful environment shaping.
    GitHub Repo github.com/AlexandreSajus/Qua...
    0:00 Intro
    0:22 Physics
    1:08 Control Theory
    2:04 Reinforcement Learning
    3:45 Training
    4:13 Results
    4:46 Conclusion

ความคิดเห็น • 53

  • @abisheksunil
    @abisheksunil ปีที่แล้ว +6

    Cool!! Would love to see more videos on RL, especially in an environment with a lot more parameters for the agent to control.

    • @alexandresajus
      @alexandresajus  11 หลายเดือนก่อน +1

      Hey! I finally did another video on RL where I trained a humanoid AI to pass obstacles from Total Wipeout in Unity 3D. I hope you'll like it: th-cam.com/video/_YXOLM2a41Q/w-d-xo.html

  • @thinkindude5566
    @thinkindude5566 ปีที่แล้ว +3

    Great video love it

  • @EigenA
    @EigenA ปีที่แล้ว +1

    Cool, good job.

  • @imannabiyouni3006
    @imannabiyouni3006 ปีที่แล้ว +2

    Keep doing the good work 👏.
    Just curious what platform did you use to make this TH-cam video?
    The edit and timeline is impressive.

    • @alexandresajus
      @alexandresajus  ปีที่แล้ว

      Thanks a lot! I use OBS for recording, Adobe After Effects for editing and Adobe Media Encoder for exporting

  • @manigoyal4872
    @manigoyal4872 8 หลายเดือนก่อน +3

    The disadvantage of PID as not able to reach higher speed even if the target is far away seems to be wrong tho
    the control output from a PID controller is directly proportional (P) to the error, so if the target is far away, error is more, therefor the output of PID controller will be more.

    • @alexandresajus
      @alexandresajus  8 หลายเดือนก่อน +2

      Yes, indeed, I should have phrased it better. What I meant is that PID coefficients are constants and do not change according to the distance to target. When the target is really far, I would ideally like to increase the Integral coefficient to start off with a more aggressive behaviour then reduce it to become more careful. In my mind that is the edge of Reinforcement Learning: the behaviour has more freedom and can adapt to more situations.

  • @Alpha725_
    @Alpha725_ ปีที่แล้ว +1

    Video got me to sub. Now you just need to write a couple thousand papers and you will have a million subs 😋

  • @SP-db6sh
    @SP-db6sh ปีที่แล้ว +2

    Amazing, make a video on Disaster management Drone, trained with DRL ?
    How to design efficient aerodynamics & battery efficient Drones with DRL on virtual environment like VR?

  • @freddy_bsc
    @freddy_bsc ปีที่แล้ว +4

    Very cool video, keep up the great work! I wonder, how fast the AI-controlled drone could have been with more training.

    • @alexandresajus
      @alexandresajus  ปีที่แล้ว +3

      Thanks a lot! Unfortunately, at 500k training steps, the rewards had converged and continuing the training did not improve the agent's performance. It probably reached the optimal behaviour for this environment

  • @jeremybertoncini6935
    @jeremybertoncini6935 ปีที่แล้ว +3

    Hi,
    as the task to fulfill is Path-Planning, did you think about comparing results using Optimal Control theory too ?
    For example, Model Predictive Control algorithms may be efficient and real-time solving the presented scenario. Moreover, MPC are collision avoidance robust too, in case you would like to investigate more developed scenarios.
    In any case, very interesting framework of yours!

    • @alexandresajus
      @alexandresajus  ปีที่แล้ว +2

      Very interesting field I did not know about, thanks for sharing! Yeah it would be interesting to try MPC and more complex scenarios, I might try it one day

    • @manigoyal4872
      @manigoyal4872 8 หลายเดือนก่อน +1

      It's actually good where the path planning is done by NN or any other planner, and MPC used for achieving the track line as close to the path

  • @barissayin2985
    @barissayin2985 11 หลายเดือนก่อน +1

    you are awesome dude

    • @alexandresajus
      @alexandresajus  11 หลายเดือนก่อน

      Thank you!

    • @barissayin2985
      @barissayin2985 11 หลายเดือนก่อน +1

      @@alexandresajus i am planning on workin on multi-agent systems on quadcopter drones also as my final project at university. I would like to keep in touch and follow you more :)

    • @alexandresajus
      @alexandresajus  11 หลายเดือนก่อน

      Good idea! Sure we can keep in touch on Linkedin if you want: www.linkedin.com/in/alexandre-sajus

    • @barissayin2985
      @barissayin2985 11 หลายเดือนก่อน +1

      @@alexandresajus right, sent request

  • @sanchaythalnerkar9736
    @sanchaythalnerkar9736 9 หลายเดือนก่อน +2

    Great Video , can you create a video where you explain the process step by step , for beginners?

    • @alexandresajus
      @alexandresajus  9 หลายเดือนก่อน

      It is not the first comment I get about making a tutorial on this, so I think I will do a tutorial on this process. Meanwhile there are a lot of resources on the internet about Control Theory and Reinforcement Learning. For RL, I really recommend to follow a tutorial about stable-baselines and gym, these are very easy to use RL frameworks.

  • @kennethporst4359
    @kennethporst4359 ปีที่แล้ว +1

    I'm confused...how do you give a computer math and it equates that to moving forward?

    • @JohnDoe-rx3vn
      @JohnDoe-rx3vn 9 หลายเดือนก่อน +1

      PID is just three numbers you add together to tell the drone to get to the target without wasting time, but also to slow down so it doesn't overshoot the target. It's easy to use, and only needs the distance to the target and the time in order to work. It's popular because it works really well for how simple it is, and doesn't use a lot of computer power.
      P is the error (e), which is the distance to the target
      I is e added up over time, which starts to add up the longer you're away from the target
      D is how fast your speed is changing. Current Error - The error last time we checked, then divided by the time that passed since we checked. This number is negative, and slows the drone down by reversing the throttle as you get closer to the target.
      All of these numbers are multiplied by their Gains, Kp, Ki, and Kd respectively (which you manually change to "tune" the PID controller), and then added together. Whatever that number becomes is the throttle. In this video, it's the tilt of the drone. It's way simpler than the scary equations make it look.

  • @bignerd3783
    @bignerd3783 ปีที่แล้ว +2

    based hotline miami music

  • @rverm1000
    @rverm1000 ปีที่แล้ว +1

    cann you make a video on that process? sure would like to apply this to real life stuff. autonomous machines

    • @alexandresajus
      @alexandresajus  ปีที่แล้ว

      I'll maybe do a more in-depth video on the process. Meanwhile there are a lot of resources on the internet about Control Theory and Reinforcement Learning. For RL, I really recommend to follow a tutorial about stable-baselines and gym, these are very easy to use RL frameworks. Keep in mind that using RL for real life machines is really complicated as RL requires a lot of training steps to work, which is complicated outside of a simulated environment. What do you have in mind in terms of real life applications?

    • @rverm1000
      @rverm1000 ปีที่แล้ว +1

      I'm taking a course on it now. But putting everything together is something they never teach

    • @alexandresajus
      @alexandresajus  ปีที่แล้ว

      @@rverm1000 ​Yes! I had that same problem a few years ago. Every RL course is generally based on the Sutton & Barto book and is way too theoretical.
      If you want to learn the practical side of RL, I recommend you lookup tutorials on how to create a custom gym environment and how to train a stable-baselines agent on either TowardsDataScience or Medium. The tutorials there are generally very practical and easy to follow

  • @underlecht
    @underlecht ปีที่แล้ว +1

    Waiting for full quad

    • @alexandresajus
      @alexandresajus  ปีที่แล้ว

      You mean 3D with 4 propellers? Could be doable, I would need to use Unity for 3D rendering though

  • @perfumedsea
    @perfumedsea ปีที่แล้ว +2

    So it's a game? Any plan to port to real drone?

    • @alexandresajus
      @alexandresajus  ปีที่แล้ว +1

      Yeah this is simulation only, no plans on my end to test it on a real drone since the gap between simulation and production in reinforcement learning is very big but a good example of competent people achieving drone control in real life using RL is the research paper: A Zero-Shot Adaptive Quadcopter Controller ( arxiv.org/abs/2209.09232 )

    • @perfumedsea
      @perfumedsea ปีที่แล้ว +1

      @@alexandresajus Thanks. I feel it might be feasible or maybe someone already has done that. But not published it yet.

  • @user-sk4jp3ul4q
    @user-sk4jp3ul4q 4 หลายเดือนก่อน

    python -m quadai
    /usr/bin/python: No module named quadai this helped pip install -e . thank you so much

    • @alexandresajus
      @alexandresajus  4 หลายเดือนก่อน

      Check this issue and let me know if it solves your case: github.com/AlexandreSajus/Quadcopter-AI/issues/2

    • @user-sk4jp3ul4q
      @user-sk4jp3ul4q 4 หลายเดือนก่อน

      th for your answers. that was solved I think. another ERROR: Could not find a version that satisfies the requirement numpy==1.26.0 (from quadai)
      ERROR: No matching distribution found for numpy==1.26.0
      @@alexandresajusI am trying man

    • @alexandresajus
      @alexandresajus  4 หลายเดือนก่อน

      @@user-sk4jp3ul4q What Python version do you have (python --version)? I think new versions of numpy only support Python 3.8+

    • @user-sk4jp3ul4q
      @user-sk4jp3ul4q 4 หลายเดือนก่อน

      reinstalled ubuntu 22 from 20/ uses vs code / the same problems ModuleNotFoundError: No module named 'quadai'anothe r libraries installed good@@alexandresajus

    • @user-sk4jp3ul4q
      @user-sk4jp3ul4q 4 หลายเดือนก่อน +1

      this helped pip install -e . thank you so much@@alexandresajus

  • @TavoFourSeven
    @TavoFourSeven ปีที่แล้ว +2

    This needs to get to a point where nobody should need to tune a drone (only like, a master multiplier in betaflight). Like just build it and go, slap any props on at anytime and the (ai flight controller name) just knows real time what to do for best propwash handling safely. Could make GPS use probably easier. So many reasons is why I just looked it up and am happy this is only 2 months old. Toroidal prop= yawn*

    • @TavoFourSeven
      @TavoFourSeven ปีที่แล้ว +1

      I'm talking very adaptive P I and D

    • @alexandresajus
      @alexandresajus  ปีที่แล้ว +1

      Yeah, a team of researchers have a great paper on this subject: A Zero-Shot Adaptive Quadcopter Controller ( arxiv.org/abs/2209.09232 ). They used reinforcement learning to create a drone controller that didn't need tuning. They tried to use that controller to hover real drones with different sizes and weights and the success rate was quite impressive but I am guessing it'll be a while before something like this is production ready.

    • @TavoFourSeven
      @TavoFourSeven ปีที่แล้ว +1

      ​@@alexandresajus promising stuff indeed. Gonna be a great day when those are for sale. Maybe even ECS too.😮

  • @hradynarski
    @hradynarski 7 หลายเดือนก่อน +1

    Cool experiment, but that also may prove that human used in experiment sucks at drone game and in PID control tuning right? ;)

    • @alexandresajus
      @alexandresajus  7 หลายเดือนก่อน +1

      Haha, hey chill out you're talking about me here (but yes you are correct)

    • @hradynarski
      @hradynarski 7 หลายเดือนก่อน +1

      @@alexandresajusI like to suggest, human vs AI PID tuning contests. That would be not only entertaining but also useful.

    • @alexandresajus
      @alexandresajus  7 หลายเดือนก่อน

      @@hradynarski Yeah could be fun! Hard to organize but fun

  • @govynela4176
    @govynela4176 ปีที่แล้ว +1

    Hi ! I liked this video. I would like read your paper. Can I have it ?

    • @alexandresajus
      @alexandresajus  ปีที่แล้ว

      Yeah sure! In the description there is a link to the GitHub repo of the project; in this repo there is a Reinforcement_Learning_for_the_Control_of_Quadcopters.pdf file; that is the paper.
      Keep in mind that I am not a researcher and that the paper is not peer-reviewed so take everything in it with a grain of salt

    • @govynela4176
      @govynela4176 ปีที่แล้ว +1

      @@alexandresajus thanks ! I understand why I couldn't find it on Google Scholar. 😉

  • @pierrickbo
    @pierrickbo ปีที่แล้ว

    first