Q-Learning with a Neural Network in Tensorflow

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ก.ย. 2024

ความคิดเห็น • 13

  • @MultsElMesco
    @MultsElMesco 3 ปีที่แล้ว

    I'm just starting with machine learning and this channel has helped me a lot with understanding the basics. Thanks a lot!

  • @rahulsutradhar5759
    @rahulsutradhar5759 2 ปีที่แล้ว

    Man, I love your tutorials so far! You are a life-saver. Though I had to pause the video every 30 seconds to makeout what's happening (as the explanation is very fast for me); this is the only video that I have actually learned about Q Learning the most.
    I thank you from the bottom of my heart.

  • @pranjalthakur8115
    @pranjalthakur8115 5 ปีที่แล้ว +6

    Good Tutorial but the pace is on higher side..

  • @nubscripters3756
    @nubscripters3756 4 ปีที่แล้ว +4

    hey why does your ai always train alot better than mine? on first 100 episodes we get similar results but on the next 100 i either got 3/4 (or 0) and you get like 50!

  • @gregchance5090
    @gregchance5090 4 ปีที่แล้ว +1

    Hi, great tutorial!
    Cannot feed value of shape () for Tensor 'Placeholder_2:0', which has shape '(1,)'
    From self.sess.run(self.optimizer, feed_dict=feed)
    So tf.reduce_sum(tf.square(agent.target_in - agent.q_action)) returns a tensor with shape=() is this the issue?
    but if i change self.target_in = tf.compat.v1.placeholder(tf.float32, shape=[1]) to shape=[] it runs?
    However, rewards over 200 episodes never go over 1 or 2?
    Thanks!

  • @hitinjami1143
    @hitinjami1143 4 ปีที่แล้ว +1

    great video bro.
    just want to know how to work with an image as my observation space.

  • @yashmandilwar8904
    @yashmandilwar8904 5 ปีที่แล้ว +1

    Hi Shawn. Thanks for this tutorial. I have a question on line 30 and 38 after you've coded the DQN . Shouldn't you be calling self.q_action instead of self.q_state and then take max of the values?
    Thanks!

  • @sergeypigida4834
    @sergeypigida4834 4 ปีที่แล้ว +1

    Hi Shawn, thank you for so detailed tutorial. Could you please give some tips on how to rewrite your code for TensorFlow 2.0?

  • @AmitYadav-zk8zm
    @AmitYadav-zk8zm 3 ปีที่แล้ว

    Great video. However I don't think your statement about NN updating multiple weights per iteration is true. For each iteration only a single state and action is active hence the gradients of only the weights attached to them would be non-zero and hence only they would be updated - that would be only one weight.

  • @alexsmith3974
    @alexsmith3974 4 ปีที่แล้ว

    What version of tensorflow are you using here/how would you implement this in 2.0? I'm using 2.0, and when I copy and paste the code you provide (which should work), I get an error: "module 'tensorflow' has no attribute 'placeholder'". any ideas?

    • @azizrais1526
      @azizrais1526 4 ปีที่แล้ว +2

      import tensorflow.compat.v1 as tf
      tf.disable_v2_behavior()

  • @ryanbeasley1079
    @ryanbeasley1079 4 ปีที่แล้ว

    Anyone else having trouble with the reward never going over 1?

    • @nothingtodo3097
      @nothingtodo3097 3 ปีที่แล้ว

      yep, mine is not updating randomly anymore