Introduction to Reinforcement Learning | DigiKey

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 มิ.ย. 2024
  • Reinforcement Learning (RL) is a field of machine learning that aims to find optimal solutions to control theory problems for various tasks. It employs an artificial intelligence (AI) “agent” that takes in observations, chooses actions, and learns from rewards. Modern RL algorithms train agents using trial-and-error approaches that involve directly interacting with the given environment.
    In the video, we cover the basic theory behind RL and demonstrate how to use Farama Foundation Gymnasium (gymnasium.farama.org/) and Stable Baselines3 (stable-baselines3.readthedocs...) in Python to train an AI agent to solve the classic cartpole (gymnasium.farama.org/environm...) control theory problem. At the end of the video, we encourage you to try applying the knowledge to solve the slightly more advanced inverted pendulum problem (gymnasium.farama.org/environm....
    The solution to the challenge can be found here: www.digikey.com/en/maker/proj...
    Code for training RL agents to solve both the cartpole and pendulum problems can be found here: github.com/ShawnHymel/reinfor...
    In RL, the environment can be anything the agent interacts with, such as board games, video games, virtual settings, or the real world. We often use a code wrapper (e.g. Gymnasium) to observe this environment, perform agent-specified actions, and assign rewards. Note that rewards are considered part of the environment and are instrumental in training.
    The decision-making process for choosing actions based on observations is known as the “policy.” During training, the agent selects actions randomly or per policy. The environment then offers a new observation and reward, guiding the training algorithm to help the agent choose actions leading to higher predicted total rewards in the future.
    The cartpole problem consists of a virtual pole balanced on top of a cart that can only move left and right. The goal is to design an AI agent that can keep the pole balanced by pushing the cart left or right. In the video, we use Deep Q-Learning (towardsdatascience.com/deep-q...) to train a Deep Q-Network (DQN) to solve the cartpole problem.
    We list some recommended reading and viewing materials below if you would like to dive deeper into reinforcement learning.
    Articles:
    Reinforcement Learning Algorithms - an intuitive overview - / reinforcement-learning...
    Which Reinforcement learning-RL algorithm to use where, when and in what scenario? - medium.datadriveninvestor.com...
    Q-Learning vs. Deep Q-Learning vs. Deep Q-Network - www.baeldung.com/cs/q-learnin...
    Deep Q Networks (DQN) With the Cartpole Environment - wandb.ai/safijari/dqn-tutoria...
    RL - Proximal Policy Optimization (PPO) Explained - / rl-proximal-policy-opt...
    Proximal Policy Optimization (PPO) - huggingface.co/blog/deep-rl-ppo
    Related Videos:
    Exploring Reinforcement Learning: Can AI Learn to Play QWOP?
    Intro to Edge AI
    Related Project Links:
    Intro to Reinforcement Learning Using Gymnasium and Stable Baselines3
    Related Articles:
    Teach an AI to play QWOP
    What is Edge AI? Machine Learning + IoT
    Learn more:
    Maker.io - www.digikey.com/en/maker
    DigiKey’s Blog - TheCircuit www.digikey.com/en/blog
    Connect with Digi-Key on Facebook / digikey.electronics
    And follow us on Twitter / digikey
    00:00 - Intro
    00:59 - History of reinforcement learning
    02:14 - Environment and agent interaction loop
    06:21 - Gymnasium and Stable Baselines3
    07:55 - Hands-on: how to set up a gymnasium environment
    26:57 - Markov decision process
    31:02 - Bellman equation for the state-value function
    34:12 - Bellman equation for the action-value function
    35:47 - Bellman optimality equations
    36:43 - Exploration vs. exploitation
    38:39 - Recommended textbook
    39:25 - Model-based vs. model-free algorithms
    40:27 - On-policy vs. off-policy algorithms
    41:19 - Discrete vs. continuous action space
    42:36 - Discrete vs. continuous observation space
    43:56 - Overview of modern reinforcement learning algorithms
    46:29 - Q-learning
    49:27 - Deep Q-network (DQN)
    51:59 - Hands-on: how to train a DQN agent
    01:12:36 - Usefulness of reinforcement learning
    01:13:26 - Challenge: inverted pendulum
    01:14:10 - Conclusion
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 7

  • @OMNI_INFINITY
    @OMNI_INFINITY 3 หลายเดือนก่อน

    Glad shawn is making tutorials again! Thanks! Congratulations on getting hired by digikey! Should make a “PCB art in KiCAD” video and an “surface mount prototyping with only a desoldering hotplate” video and a “how to make a diy pick-and-place robot arm that has machine vision capability”. And yes, would maybe be up for collaborating on designing that oick and place robot. Really don’t like soldering by hand or placing tiny components by hand.

  • @dave20874
    @dave20874 10 หลายเดือนก่อน +4

    I watched thinking this would be a good refresher for material I'd learned over the last couple years. But I see a lot has changed with gymnasium. And the stable baselines material was all new to me. I learned more than I expected.

  • @PatrickHoodDaniel
    @PatrickHoodDaniel 9 หลายเดือนก่อน

    This is, by far, the best ad that I have ever seen!! Also, a great explanation. Thank you for pushing this to me.

  • @geekzombie8795
    @geekzombie8795 10 หลายเดือนก่อน +2

    Thank you for this informative video!

  • @OMNI_INFINITY
    @OMNI_INFINITY 3 หลายเดือนก่อน

    If society was a meritocracy, both I and shawn would already be millionaires or billionaires. On with the casing design of My new computer product. Wish I had millions to market it more properly.

  • @insanitygamer_vibing
    @insanitygamer_vibing 10 หลายเดือนก่อน +1

    Why is this an ad on TH-cam music?? 😂😂😂

    • @geekzombie8795
      @geekzombie8795 10 หลายเดือนก่อน

      C’est éducation!