Coding Deep Q-Learning in PyTorch - Reinforcement Learning DQN Code Tutorial Series p.1

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ก.ย. 2024

ความคิดเห็น • 82

  • @ruantwice
    @ruantwice 3 ปีที่แล้ว +4

    Your channel is a gem! Thank you for making such quality content!

  • @kafaayari
    @kafaayari ปีที่แล้ว +1

    Great tutorial. Everything put together in the most basic form and it works very well. As a feedback I can say that inside .act() call and when calling forward of target_net, grad could be disabled with torch.no_grad context and it'd run much faster since computation graph won't be created.

  • @juleswombat5309
    @juleswombat5309 3 ปีที่แล้ว +2

    That was pretty cool, and clear explanation and code. Many Thanks for this.
    Albeit I don't think I will ever quite understand Squeeze and UnSqueeze() removing and adding pytorch dimensions.
    I guess in practice I would add Save model (pytorch dict parameters) every so often, with early stopping etc, since running RL algorithms can be painfully slow experience in practice.
    Also where/when to copy ptorch tensors onto Cuda devices to speed up the algorithm on GPUs where available. But there are examples and code out there on how to do this, and can easily applied to your code.

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      Check out the next video in the series where we add gpu support: th-cam.com/video/tsy1mgB7hB0/w-d-xo.html

  • @Throwingness
    @Throwingness 3 ปีที่แล้ว

    Great video. This is the 4th I watched and only one that didn't have 300 plus lines of code in it. This is the hand up I needed.

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      😁glad it helped

  • @jhmerkaba8080
    @jhmerkaba8080 ปีที่แล้ว +3

    Very nice description of DQN, do you have the code in a repository ?, I tried to write the code as you described in the video, but I am getting some errors.

  • @iliasp4275
    @iliasp4275 5 หลายเดือนก่อน

    Great Video, what you said at 30:00 really resonated with me. I have the same problem! my models learns fine, up to a point where it chooses to take a dive. The thing it that it does this about half the times. Sometimes it trains fine. Would you mind telling me what the bug was, because there is a high chance I have the same one!
    Cheers!

  • @RabeeQasem
    @RabeeQasem 3 ปีที่แล้ว +3

    Maybe it's an easy task, but can you make a tutorial on how we can find the shortest path of Graph as we use the graph as an environment on Deep RL ? since all the tutorials on youtube are using the GYM library as an environment, changing the environment to the graph theory will be nice and you are easy to understand, and you have ways to exposes the idea easily

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      I'm not sure what you're trying to accomplish here. An RL agent that learns djikstra's?

    • @manelmimi8197
      @manelmimi8197 ปีที่แล้ว

      @@brthor1117 create an environnement from an MDP model and training the DQN agent to this environnement

    • @brthor1117
      @brthor1117  ปีที่แล้ว

      @@manelmimi8197 Although I haven't done research on this, many games can be modeled as MDPs and it shouldn't be so different from a typical gym environment. If the action space changes between nodes, you may run into trouble there due to the limitations of neural nets. But I feel confident that there is research on this topic, and probably examples on github.

  • @VitorMartins-l1h
    @VitorMartins-l1h หลายเดือนก่อน

    Good tutorial, though due to environment versions, it is required more additional steps.

  • @davebostain8588
    @davebostain8588 8 หลายเดือนก่อน

    Great video - I could only get it to run in Colab, not in VSCode or Pycharm. If I can just figure out how to render it in Colab...
    I will be viewing the next one too...

  • @ommagrawal4875
    @ommagrawal4875 ปีที่แล้ว +1

    My env.render() is not working. I tried with all the render modes also including rgb_array
    Plz help

    • @brthor1117
      @brthor1117  ปีที่แล้ว

      No idea, best to search for issues related to the gym package, as there is nothing non-standard related to its usage here.

  • @CHINNOJISANTOSHKUMARNITAP
    @CHINNOJISANTOSHKUMARNITAP ปีที่แล้ว

    use ensembles and add auxiliary tasks to each deep q network in ensemble for any game

  • @jahidchowdhurychoton3591
    @jahidchowdhurychoton3591 4 หลายเดือนก่อน

    Your website link is not working. Can you please provide the code as well?

  • @physicarium2139
    @physicarium2139 7 หลายเดือนก่อน

    Thanks for this tutorial! One question is in the Nature paper they have one For-loop nested into another, whereas in your code you do not do this. Was there any particular reason why?

    • @brthor1117
      @brthor1117  7 หลายเดือนก่อน

      I do not remember, perhaps it was a simpler way of writing the code.

  • @cozziekuns4967
    @cozziekuns4967 2 ปีที่แล้ว

    Super helpful, GOAT

  • @bluedade2100
    @bluedade2100 ปีที่แล้ว

    Why don't we keep track of episode reward values when we are initializing? I am asking as it is occurring 1000 times. Isn't this value important?

  • @cheggmi3637
    @cheggmi3637 2 ปีที่แล้ว +1

    Very nice tutorial. It is good to see how you implemented this paper and making it as easy as you can. My question will be if you try to implement this on colab, there is an error. How do you suggest we go about implementing this for a smooth running. I a really new to this topic and I am trying to learn it as much as possible. Thank you.

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      This should work fine in colab, any error is most likely a bug in your program.

  • @joaobentes8391
    @joaobentes8391 3 ปีที่แล้ว +2

    nice
    keep the grind
    one day u will suceed

  • @yujisakabe4900
    @yujisakabe4900 2 ปีที่แล้ว

    Thank you for the awesome explanation, it was really helpful :)

  • @Throwingness
    @Throwingness 2 ปีที่แล้ว

    12:00 Deep Q Learning can only be used with agents that have a discrete number of actions.
    What Reinforcement Learning algorithms use continuous actions?

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      algorithms in the policy gradient space such as PPO, and others

  • @marcin.sobocinski
    @marcin.sobocinski 2 ปีที่แล้ว

    Very good clear code :) Thanks!

  • @GowthamRajVeeraswamyPremkumar
    @GowthamRajVeeraswamyPremkumar ปีที่แล้ว

    hi. i did the code, but i don't understand why my output step is starting from 11,000 and not from 0 as you did?

  • @valentinbouchentouf4626
    @valentinbouchentouf4626 3 ปีที่แล้ว

    Very nice video, thank you !!

    • @brthor1117
      @brthor1117  3 ปีที่แล้ว

      Thanks for the comment!

  • @Throwingness
    @Throwingness 2 ปีที่แล้ว

    Do you think this algorithm could be used to control a stepper motor? How many actions can this algorithm output? Because a stepper motor needs to be commanded to make a certain number of steps. Maybe between 0 and 100. I know that is a lot of actions to learn. Would you recommend PPO instead?

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      Assuming you are observing some environmental state between steps of the motor and that the motor at least probabilistically affects those observations, some form of reinforcement learning should be able to work. I recommend experimenting with different algorithms, you will probably want to use more sophisticated variants of Q-learning instead of this vanilla implementation.

    • @Throwingness
      @Throwingness 2 ปีที่แล้ว

      @@brthor1117 Thank you for responding to all my comments. I will be modelling a robot with stepper motors. The motor will receive a command to move and be unresponsive until it stops. It will learn not to make bad moves.
      I think I'm going to implement a MOMPO algorithm next. I've seen video of PPO and MOMPO and the MOMPO looks much more smooth.
      I haven't found a PyTorch implementation of MOMPO, but a few of MPO. Can I ask, is the difference a penalty if the prediction goes outside -1 and 1?

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      ​@@Throwingness "is the difference a penalty if the prediction goes outside -1 and 1?" I'm not totally sure what you are asking there. If the rewards are scaled too high it can affect the scale of the advantages in algorithms like PPO and cause difficulty in training. Commonly some type of reward normalization is used to offset this. RL algorithms can require millions of observations to reach decent performance and that can be difficult to collect using a robot in real time. There has been success training RL algorithms on robotic simulations and then transferring that into the real world.

  • @Mesenqe
    @Mesenqe 2 ปีที่แล้ว

    Thank you for the step by step explanation. Is it possible to do a tutorial of DQN on image classification, like MNIST? 😍

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      👉th-cam.com/video/td0TjL5tznc/w-d-xo.html

  • @1UniverseGames
    @1UniverseGames 2 ปีที่แล้ว

    Great video.

  • @mogaolimpiu7190
    @mogaolimpiu7190 7 หลายเดือนก่อน

    I have checked multiple times, but i can't seem to find any differences (except for the expecting 4, because of the new return parameter)
    obss = np.asarray([t[0] for t in transitions])
    ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (32,) + inhomogeneous part.
    i can't for the life of me get why this happens

    • @amitnayak7917
      @amitnayak7917 6 หลายเดือนก่อน

      Same here...

    • @Adinath993
      @Adinath993 4 หลายเดือนก่อน

      Env.reset() returns a tuple

    • @VitorMartins-l1h
      @VitorMartins-l1h หลายเดือนก่อน

      you have to make env.reset()[0] to get the observation

  • @Adinath993
    @Adinath993 4 หลายเดือนก่อน

    Idk why but my model is not learning

  • @ΑντώνηςΚαρβελάς-ι1χ
    @ΑντώνηςΚαρβελάς-ι1χ 2 ปีที่แล้ว

    Shouldn't you detach the targets in the loss function?

  • @adrianbrandheini319
    @adrianbrandheini319 ปีที่แล้ว

    Hi I tried following your tutorial but i got the following error: ---> 17 new_obs, rew, done, _ = env.step(action) the error: too many values to unpack (expected 4) any idea on what the problem could be? thanks!

    • @brthor1117
      @brthor1117  ปีที่แล้ว

      maybe a change in the api of the env.step() function from gym. You'll have to put it under a debugger or print the return value from that function and take a look at the tuple it returns.

  • @davidlourenco7786
    @davidlourenco7786 ปีที่แล้ว

    Im getting the error -> can't convert np.ndarray of type numpy.object_. torch as tensor !

    • @brthor1117
      @brthor1117  ปีที่แล้ว

      I have never hit that error, maybe try torch.from_numpy pytorch.org/docs/stable/generated/torch.from_numpy.html

  • @evgenymusicantov7119
    @evgenymusicantov7119 2 ปีที่แล้ว

    Thanks. Can i download the code from somewhere?

  • @franky0226
    @franky0226 3 ปีที่แล้ว

    Heyy can you do a video on finding the optimal set of input parameters, for the cases when we know the output? It's like a reverse engineering problem, Is it possible to use RL?

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      Optimal set of input parameters? Like function optimization? You can probably go for SGD directly if it's a mathematical function.

  • @mawkuri5496
    @mawkuri5496 2 ปีที่แล้ว

    how much memory on your laptop do you need for this not to have error of 'cuda out of memory'.
    note that laptop has shared memory of the gpu and cpu.

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว +1

      I don't know about that kind of setup. IIRC this tutorial runs the algorithm on the cpu only.

  • @RabeeQasem
    @RabeeQasem 3 ปีที่แล้ว

    not related to DQN how did you write the same line in 4 line at once at 22:04 ?

    • @brthor1117
      @brthor1117  3 ปีที่แล้ว

      Some editors allow you to place multiple cursors and edit at each cursor simultaneously. In this case pycharm lets you alt-click to place another cursor. VS Code and Sublime Text are two others that let you do this.

    • @RabeeQasem
      @RabeeQasem 3 ปีที่แล้ว

      @@brthor1117 thank you

  • @mrtoast244
    @mrtoast244 ปีที่แล้ว +1

    29:30 wysi

  • @davidkoleckar4337
    @davidkoleckar4337 2 ปีที่แล้ว

    Nice

  • @mastermp8366
    @mastermp8366 3 ปีที่แล้ว

    Good paper!

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      The paper was pretty good once I finally understood it.

  • @nichuffnoob6929
    @nichuffnoob6929 2 ปีที่แล้ว

    Hi i need help with my project can you help me?

  • @superz5510
    @superz5510 3 ปีที่แล้ว

    This is not DQN in the Rainbow version.

    • @brthor1117
      @brthor1117  3 ปีที่แล้ว +1

      The videos are organized as a successive set of tutorials on DQN variants that lead up to a Rainbow implementation. This video is part 1 and covers the vanilla DQN theory and implementation.

  • @uonliaquat7957
    @uonliaquat7957 3 ปีที่แล้ว

    Would you mind providing the link to Github?

    • @brthor1117
      @brthor1117  3 ปีที่แล้ว +1

      This is a follow along video and has no provided code. There are many sample DQN implementations available if all you need is some code.

    • @uonliaquat7957
      @uonliaquat7957 3 ปีที่แล้ว

      @@brthor1117 Well My average reward isn't getting above 80, even though I've used the same parameters as yous. How much time on average CartPole-v0 DQN takes for training?

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      @@uonliaquat7957 IIRC about 2 minutes. Likely you have a bug.

  • @NoonSummit-i3x
    @NoonSummit-i3x 14 วันที่ผ่านมา

    Jackson Scott Taylor Ruth Hernandez Michael

  • @AshishSingh-753
    @AshishSingh-753 3 ปีที่แล้ว

    Cool

    • @brthor1117
      @brthor1117  3 ปีที่แล้ว +1

      👍

    • @AshishSingh-753
      @AshishSingh-753 3 ปีที่แล้ว

      How much python is needed for reinforcement learning

    • @brthor1117
      @brthor1117  3 ปีที่แล้ว

      @@AshishSingh-753 There are other frameworks & languages you can use as well. For example, caffe in c++ or torch in lua. Most practitioners use python though and a lot of the examples you will find will be in python.

  • @varghesedaison3435
    @varghesedaison3435 2 ปีที่แล้ว

    anyone who coded this and found it working, could you please share the code. I coded it and got error. @brthor could you help with my error?

  • @MariusCheng
    @MariusCheng 2 ปีที่แล้ว

    i have a problem, how to fix this?
    target_net.load_state_dict(online_net.state.dict())
    'Network' object has no attribute 'state'

    • @brthor1117
      @brthor1117  2 ปีที่แล้ว

      .state.dict() -> .state_dict()