I do not know what to say. I think closes words what can describe my feelings now are "wow, that was amazing and very very simple that even I understood what is going on there". Going to play with code and try to solve more problems. I wish I found your channel earlier. 👍🏻
I usually don't write comments on TH-cam videos but wow! I've watched some of your videos and they are extremely helpful. The number of views on this video and subscribes on your channel are so underrated thx for the great content and hope u keep making good videos like this one!
For your content, 6.5k subs are too little. I have been scouring the internet for reinforcement learning courses ever since AlphaGo beat the world champion, and today I found your video. And I'm glad I did.
Video Summary (Made with HARPA AI):- 00:30 🧠 Core concept: "Area 51" summarizes Action, Reward, Environment, and Agent in reinforcement learning. 01:00 🐍 Python setup: Use OpenAI Gym, TensorFlow, Keras to create and train reinforcement learning models. 04:52 🏞 Gym environment setup: Import dependencies, set up the environment, and extract states and actions. 08:27 🧠 Build deep learning model: Construct a model with TensorFlow and Keras. 13:48 🤖 Train the model: Compile and train with KerasRL, monitor progress. 15:06 🎯 Test the model: Evaluate performance in the Gym environment. 17:30 💾 Save and reload weights: Save and reload model weights. 19:49 🔃 Reuse the model: Rebuild, load weights for further testing or deployment.
tl;dr if you're watching this in 2022, make sure you pip install gym==0.17.1. I'm sure this is due to the age of this video/updated code being released, but I had the following errors in case anyone else comes across this. First was - ValueError: too many values to unpack (expected 4) - for the line n_state, reward, done, info = env.step(action). For some reason adding a 5th parameter so it looked like this - n_state, reward, done, info, test = env.step(action) - made it pass. Next was ValueError: Error when checking input: expected flatten_input to have shape (1, 4) but got array with shape (1, 2) on line dqn.fit(env, nb_steps=50000, visualize=False, verbose=1). I was able to fix this by downgrading to python 3.8, downgrading protobuf to 3.9.2, and explicitly installing the versions of all traces found in the pip install trace of the jupyter notebook. When I changed the gym version to the one found in the video, it allowed env.step(action) to actually take 4 parameters, instead of the 5th I had to add in to make it pass, and the code ran. After all that I went back to python 3.10, explicitly installed gym 0.17.0, then installed keras, keras-rl2, and tensorflow, and it worked again. Thanks for the video, the issues obviously aren't your fault, just wanted to pass this info off. I learned a ton about pip, library versions, and all kinds of other stuff in this process.
This worked for me aswell i downgraded protobuff which downgraded tensorflow aswell. After i upgraded tensorflow to the correct version and everything worked. I think the origin of the problem is not having the correct version of TensorFlow in the first place
Commenting for the algorithm. Started looking into deep learning recently and eventually got here, great intro and explanations. Looking forward to the other videos
As far as I can tell this tutorial sadly is already outdated since some of the API has changed now and some functions may require different arguements. And updated version of this tutorial would be great!
Thanks for explaining the code, I saw this example online already but with the step by step explanation of this scenario it was much better for learning while running the code alongside the video :)
Heya @Bogdan, thanks so much! I'm building up to more sophisticated examples of RL. I'll be doing a lot more with different environments in the coming months!
Just discovered your course, amazing! Thank you very much. It is still very relevant. Some of the gym environments have newer versions but all still works. Thanks again!
I watch your videos and feel like you taught us a very important topic like no one did. I do believe this is how no one shouldn't. Better to follow written documentations!!!
Hi Nocholas, you did a great job there, thanks for sharing your knowledge! I would like to mention, that in my case I had a problem running the code, because I got a value error in the line "n_state, reward, done, info = env.step(action)". Adding a fifth value "observation" on the left side (so that it looks like "n_state, reward, done, info, observation = env.step(action)" got the code up and running :-) Nevertheless your videos are really helpful and please keep going! You're doing an amazing job!
# new version with terminated and truncated episodes = 10 for episode in range(1, episodes+1): state = env.reset() #initial for each episode terminated = False score = 0 while not terminated: env.render() # render the CartPole action = random.choice([0,1]) # 0,1 left or right observation, reward, terminated, truncated ,info = env.step(action) score+=reward #based on our step we get a reward till it's done print('Episode:{} Score:{}'.format(episode, score)) Docs observation (object) - this will be an element of the environment’s observation_space. This may, for instance, be a numpy array containing the positions and velocities of certain objects. reward (float) - The amount of reward returned as a result of taking the action. terminated (bool) - whether a terminal state (as defined under the MDP of the task) is reached. In this case further step() calls could return undefined results. truncated (bool) - whether a truncation condition outside the scope of the MDP is satisfied. Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. info (dictionary) - info contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. It also can contain information that distinguishes truncation and termination, however this is deprecated in favour of returning two booleans, and will be removed in a future version.
@@FrancescoPalazzo26 I had the same problem. In my case I could solve it by importing the modules from a different path, which is 'tensorflow.python'. so the import commands look Like 'from tensor flow.python.keras.models import sequential'. Hope this solves your problem!
episodes = 10 for episode in range(1, episodes+1): state = env.reset() terminated = False score = 0 while not terminated: env.render() action = random.choice([0,1]) n_state, reward, terminated, truncated, info = env.step(action) score+=reward print('Episode:{} Score:{}'.format(episode, score)) run these for new versions
Awesome video! I just started my master in AI and seeing your videos helps a lot to remember a couple “key” things before the start of the semester! I also just started a YT channel, if you’re down we could maybe see how we could create something together, might be fun! Have a good day 👋🏼
Wow! such a wonderful lessons with practical example. I loved it. I want to learn more about self control action mechanism for multivariate industrial control using RL. Kindly put some light on it
Hi Nick .Thanks for your tutorial it really helps me a lot.However , i am getting an error saying :"ValueError: Error when checking input: expected flatten_input to have shape (1, 4) but got array with shape (1, 2)",So ,i am wondering why this error didn't happen in your case
Thanks sooo much! There's some more reinforcement learning stuff coming this week, hopefully a video on Atari and (assuming my GPU doesn't catch fire) one on CARLA!
Hey! Thanks for the video. I would love to see how I can solve a problem with my own environment. Or how to build a specific environment and an agent with specific actions. I am at the moment not familiar with OpenAI but I think it would be interesting to see something more custom. :)
Hi Nick, Thanks for your tutorial, it really helped me kick off the field of RL. There is an issue of the keras-rl2 package you used, specifically the NAFAgent, which fails all the time even using the example given in the official repo. Could you please spare some time and take a look at it? Many thanks and wish your channel gets better and better! Best, Tony
When I try to run "dqn.fit(env,nb_steps...)" command I am getting ValueError : Error when checking input : expected flatten_2_input to have shape (1,4) but got array with (1,2) can you please help me out??
AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy' when i try dqn.compile, any idea? i tried copying the code itself but the error continues.
@@NicholasRenotte First of all, thanks a lot for your great tutorial videos. i have got the exact same error as @nn aa. I am importing as below. my TensorFlow version is 2.3.1. Could you please take a look into it? Thanks.
@@hninpannphyu8567 anytime! Can you try dropping the tensorflow. from your imports like so: OLD CODE: from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.optimizers import Adam TEST CODE: from keras.models import Sequential from keras.layers import Dense, Flatten from keras.optimizers import Adam
Hey there! I am having the same issue with *'Sequential' object has no attribute '_compile_time_distribution_strategy'* but in my case *del model* doesn't help at all. If i want to delete it before *model = build_model(states, actions) I receive the error that I want to refer to a var before declaring it (which makes total sense to me xD). Any ideas how to fix this? :) btw. this video is amazing! Keep the good work up :)
@@NicholasRenotte Sometimes my code threw some weird output but changing the agent-type fixed it. Tensorforce is pretty easy to use and does its job (so far) pretty well. I am using Tensorforce for my bachelor thesis about MTSP solutions :D
Nice introduction. It seems the DQN method is value-based even you are using BoltzmanQPolicy. BoltzmanQPolicy is like epsilon-greedy, a method to balance exploitation and exploration. Methods like DPG, PPO, A2C, and DDP can be considered as policy-based methods.
Hi Nicholas, why did you use "linear" as the activation function in your last layer instead of "softmax"? How would it differ if I choose "softmax" as activation function instead of linear for this case? Will it be possible to mention this, please? Or may be make a video on it? (When to choose linear and softmax activation function for what type of target cases)
softmax is great for classification, but the experiment shown in the video is more of a regression problem. In this case, it makes more sense to use linear. Doesn't mean you can't use softmax, but your dqn will most likely don't work as you would expect it.
Hello, Thanks for the great tutorial step by step video. Quick question. When I run dqn.fit(env, nb_steps = 50000, visualize = False, verbose = 1), I get this error: "'Sequential' object has no attribute '_compile_time_distribution_strategy'". How do I overcome this? and why did this happen? Thanks again
@@randomizer272 thank you for your answer, i have the same problem. But my knowledge is still quite limited so i don't know how i delete my model and reload the kernel. Would be nice if you could explain it a little bit more. Thanks in advance!
@@ts717 You can just do a new line del model and create the model again. It worked for me fine. I will attach the video in which he explained about this error. th-cam.com/video/hCeJeq8U0lo/w-d-xo.html
in the section "def build_model" you have to change the line : model = Sequential() into : model = tensorflow.keras.models.Sequential() i checked and it seems that's because python can misinterpret it with keras, and not tensorflow's keras (but i have no clue why) this worked for me
If you get "ValueError: Error when checking input: expected flatten_input to have shape (1, 4) but got array with shape (1, 2)", Install the package 'rl-agents==0.1.1'. It works for me.
Great video! Got it up and running in no time. One question tough: What exactly does the value of 4 out of env.observation_space.shape[0] represent? Isn't the state supposed to be a pixel vector? Or is this some kind of abstraction openAI makes?
Heya @McRookworst, for CartPole we don't use a pixel vector (moreso used in the Atari envs). In CartPole the four values represented in from the observation space are: [position of cart, velocity of cart, angle of pole, rotation rate of pole].
In 2022, the code might not work well Instead of : from tensorflow._api.v1.keras import Sequential from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.optimizers import Adam you need to use : from tensorflow.python.keras import Sequential from tensorflow.python.keras.layers import Dense, Flatten from tensorflow.python.keras.optimizers import adam_v2
Great tutorial Nic.. I was trying to implement this and encountered an error when I run the line "dqn.compile(Adam(lr=1e-3),metrics=['mae'] Error: 'Sequential' object has no attribute '_compile_time_distribution_strategy'. Can someone help me resolving this?
ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none) ERROR: No matching distribution found for tensorflow this error is showing while instaling
you can unpack only the first 4 values from the returned tuple and ignore the rest, so you can use the star: state, action, reward, next_state, *_ = env.step(action)
Hey thanks for the vid! it's great, I get the error AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy' when trying dqn.compile(...) I have the same version of tensorflow as you
@@NicholasRenotte I also have this same problem. But when you say to delete your model. It is to add the code "env.close ()", because if that didn't work.
Hello! I still don't now why, but I solve this issue writing `del model` a cell below `def build_agent(model, actions):` and then ` model = build_model(states, actions)`. Regards!!
@@vagnermartin4356 nope use 'del model', here'a an example where I do it: th-cam.com/video/hCeJeq8U0lo/w-d-xo.html I think there is conflict between Keras andtf.keras versions perhaps but this seems to resolve the error.
I keep running into errors because I don't have the right things downloaded. Ive been trying to to fix it for about an hour now and I can't figure it out! If anyone has done this more recent than 2020, and would be willing to help me I would greatly appreciate it. Thanks so much!
Hi, what an amazing video! You are a great teacher and you make the learning of RL fun! However, I have some question and it might be some rookies type of questions because i am not that experience with python. You said that we can reload the trained model, but how can i do it in VSC? Create a new Python file and import the one we created? And also, when i run the " _ = dqn.test(env, nb_episodes=15, visualize=True)" and want to change episodes(just for testing), it has to go through the process all over again, but in your case it just used the rewards already generated and printed it right away. These questions might be so easy that maybe someone in the comments can provide an answer. Thanks :)
Should be able to reload the weights by running this when you open up again: dqn.load_weights('dqn_weights.h5f') Then to chaneg the number of episodes just change the number set to nb_episodes e.g. For 30 episodes run this: _ = dqn.test(env, nb_episodes=30, visualize=True) For 40 episodes run ghit: _ = dqn.test(env, nb_episodes=40, visualize=True)
If anyone had the same issue as me, using keras-rl saying that model has no attribute __len__, I just modified the model code to: def build_model(states, actions): model = Sequential() model.add(Flatten(input_shape=(1, states))) model.add(Dense(23, activation='relu')) model.add(Dense(23, activation='relu')) model.add(Dense(actions, activation='linear')) model.__len__=actions return model and it worked (notice the additional line model.__len__ = actions Probably not the best practice, but worked without having to downgrade tensorflow
Your video is amazing! You make learning RL fun! However, I have some questions, maybe some rookie-type questions, about the best strategy for reinforcement learning, can I just extract this part, such as a vehicle turning right at an intersection, and then turning left is its best path, Can I extract only this one path among many paths? Or is it possible to convert the results of RL into text? Does this RL training log include the actions selected for these trainings? Please take the time to take a look, thank you very much!
can you please guide me solve the problem I am getting while working on this example TyperError: _set_agent() missing 1 required positional argument: 'agent'
Hi, thank you for the video. My question is : is there any specific reason behind you have installed Tensorflow 2.3.0? Can version 2.9.0 work without error?
@Nicholas Renotte This is such a well-explained video! Thanks for making it, I was looking for something exactly like this. I wanted to know whether you can make a video on custom environments using different types of observation_spaces and action_spaces (Discrete, Box, Dict, MultiDiscrete). I am trying this for a problem and I'm struggling a bit to understand how to use Dict and MultiDiscrete, most examples use Box and Discrete.
Hi Nicholas, Thank you so much for the great content! I'm running into an error "AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy'" I couldn't really find anything online to help me solve it, do you have any idea where this is from? thank you!
I am stuck at !pip install keras-rl2 Once I run this line the whole thing gets stuck, and shows "Kernel busy" status. Shows no errors. And I can't run any code after that.
I tried del model but i am getting an error in step 3 Keras symbolic inputs/outputs do not implement '__len__'. i researched in stack overflow and the answer was to downgrade to TF 1.14 don't want to do that. Any help greatly appreciated thanks
hello sir your video is really helpful ,but i m trying to run the code but at the time of pip install tensorflow it showing the error that ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none) ERROR: No matching distribution found for tensorflow please help
I have an issue about "AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy'" when I am running dgn.compile. Any ideas?
@@NicholasRenotteThank you. I have tried, but finally added "env.close()" end of that code and work properly. I think to check the better library or update. am I correct?
Great video! Quick question: how can I ask the model for the actions he took during the tests? Is there a way of getting a list or an array of all the left/right choices he makes?
I don't think it's available through Keras-RL but if you work with the environment directly you can get it based on the generated action. Want a video on it @Igor?
@@NicholasRenotte I was able to improvise by making him print the action in the env.render(), so I use visualize = True when testing and it returns the action taken in each step. I guess it kinda works. Thanks!
Thank you for the amazing content. I would like to know how to define multiple agents? As an example, if we have an environment with more than one agent taking action, how we can define the other agents and return their value?
Hi, I am stuck at the line dqn.compile(Adam(lr=1e-3), metrics= ['mae']) I am getting an error 'Sequential' object has no attribute '_compile_time_distribution_strategy' Any help would be appreciated, thanks!
I downgraded colab session to 2.3.1 now the error is "len is not well defined for symbolic tensors (dense_2/BiasAdd:0) Please call 'x.shape' rather than 'len(x); for shape information. I see others in the comments received the same error was anyone able to resolve the issue. Thanks
I'm trying to run your notebook in a docker on a Windows host. Any idea how to get it to render the plots in the notebook? For instance, step 1 seems to work, but how do I see the visualization? In the video, you go to another window to display. How does that work? Thanks for the tutorial!
Heya @John, it looks like it can be done with a virtual display but I haven't tested this out myself. Check this out: towardsdatascience.com/rendering-openai-gym-envs-on-binder-and-google-colab-536f99391cc7
I'm running it directly on a bare metal installation, the visualisation component automatically pops up in a new screen when the code is run. I'm going to look into rendering on a cloud or non-local environment this week as I'm keen to run it non-locally for a few clients.
Hi Nicholas, thanks for all your greats video. i've problem with this line of code : model = build_model(states, actions) only integer scalar arrays can be converted to a scalar index and Error converting shape to a TensorShape: only integer scalar arrays can be converted to a scalar index. do you have an idea of what can be the issue?
Wow, this was very interesting. Great video. I've been interested in trying to use Deep-RL on android games. Do you know how one could go about this? I was thinking of using screenshots as inputs to the DQN. i'd have to create a custom environment, right? Is this something you are familiar with? Thanks again for the video.
Yo @Gustavo Lorentz! Ya, you'd need a custom environment. You could also try out some of the pre-built games from OpenAI though as a kick start, LMK if you want me to make a video on doing it with games! Also, looks like there's a ton of third party envs you could use as a baseline though:github.com/openai/gym/blob/master/docs/environments.md#third-party-environments
Great video! However, I am getting an error saying "TypeError: only integer scalar arrays can be converted to a scalar index" on the line "model.add(Flatten(input_shape=(1, states)))". How do I solve this?
Great video! Can you explain the target model update + the 10_000 displayed by Keras in the verbosity? It seems like, the target model still updates every 10.000 steps, even though the target model update was set to a soft rate of 0.001. What am I missing? :D
Heya @Richchizzl, this is due to a DQN being an off-policy reinforcement learning algorithm. When you train, there are actually two models being trained in tandem, one is consistently being trained based on the latest values (think of this one as a fast learner) the other is being used to generate the next set of states (think of this one as a slow learner). Every 10,000 steps the model weights from the fast learner are copied over the the slow learner, this helps ensure you get more reliable training.
Thank's a lot for your reply! Do you know, if it's possible to change the mark of 10,000 steps within the keras framework? I'm trying to implement the connectx kaggle challenge and it seems like that a slowly increasing reward is completely shaken after the update of the slow learner...
@@richchizzl5020 hmmm, did you try changing that parameter? TBH, I'm now using stable baselines over kerasl-rl as it actually seems, well, a lot more stable. You can set the target model update frequency for a DQN pretty easily using the target_network_update_freq parameter: stable-baselines.readthedocs.io/en/master/modules/dqn.html I did a bit of a crash course on setting up experiments with it here: th-cam.com/video/nRHjymV2PX8/w-d-xo.html could swap out the algorithm used there for a DQN and set the paramater there.
Nicholas the the !pip list run from Google colab is too long to place in comments are there any modules in that should zero in on? I reset back to TF the downgrade to 2.3.1 threw the len error thanks again
Heya @Frank, qq, what was the error you were receiving again. Lost the original chain. Also, just a heads up it'll be a bit of a pain to visualise the environment in Colab.
I have a question! how can we see every action and states in each episode ? this shows just the final score of each episode but we can not see what's happening during episode?
Super video but i have a question. First i tried but i got an error then I copy your code and paste it. But i still got an error. Error : FailedPreconditionError: Could not find variable dense_5/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Resource localhost/dense_5/kernel/N10tensorflow3VarE does not exist.
I Got This Error When Compiling The Model AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy' Plese Help Me ! Thank You
I got a probleme at the Step 3 3. Build Agent with Keras-RL the error is : "TypeError: len is not well defined for symbolic Tensors. (dense_2/BiasAdd:0) Please call `x.shape` rather than `len(x)` for shape information." does anyone have a solution?
I have a little problem running this, i tried in Jupyter same as you but when i clicked in run an error appears saying "GL NOT FOUND" and it happened running the first step of this video, help plz :c
@@NicholasRenotte ImportError Traceback (most recent call last) /srv/conda/envs/notebook/lib/python3.6/site-packages/gym/envs/classic_control/rendering.py in 24 try: ---> 25 from pyglet.gl import * 26 except ImportError as e: /srv/conda/envs/notebook/lib/python3.6/site-packages/pyglet/gl/__init__.py in 94 ---> 95 from pyglet.gl.lib import GLException 96 from pyglet.gl.gl import * /srv/conda/envs/notebook/lib/python3.6/site-packages/pyglet/gl/lib.py in 148 else: --> 149 from pyglet.gl.lib_glx import link_GL, link_GLU, link_GLX /srv/conda/envs/notebook/lib/python3.6/site-packages/pyglet/gl/lib_glx.py in 44 ---> 45 gl_lib = pyglet.lib.load_library('GL') 46 glu_lib = pyglet.lib.load_library('GLU') /srv/conda/envs/notebook/lib/python3.6/site-packages/pyglet/lib.py in load_library(self, *names, **kwargs) 163 --> 164 raise ImportError('Library "%s" not found.' % names[0]) 165 ImportError: Library "GL" not found. During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) in 6 7 while not done: ----> 8 env.render() 9 action = random.choice([0,1,2,3,4,5]) 10 n_state, reward, done, info = env.step(action) /srv/conda/envs/notebook/lib/python3.6/site-packages/gym/core.py in render(self, mode, **kwargs) 238 239 def render(self, mode='human', **kwargs): --> 240 return self.env.render(mode, **kwargs) 241 242 def close(self): /srv/conda/envs/notebook/lib/python3.6/site-packages/gym/envs/atari/atari_env.py in render(self, mode) 150 return img 151 elif mode == 'human': --> 152 from gym.envs.classic_control import rendering 153 if self.viewer is None: 154 self.viewer = rendering.SimpleImageViewer() /srv/conda/envs/notebook/lib/python3.6/site-packages/gym/envs/classic_control/rendering.py in 30 If you're running on a server, you may need a virtual frame buffer; something like this should work: 31 'xvfb-run -s \"-screen 0 1400x900x24\" python ' ---> 32 ''') 33 34 import math ImportError: Error occurred while running `from pyglet.gl import *` HINT: make sure you have OpenGL install. On Ubuntu, you can run 'apt-get install python-opengl'. If you're running on a server, you may need a virtual frame buffer; something like this should work: 'xvfb-run -s "-screen 0 1400x900x24" python '
Hello, would you be able to make a video on this but using the mountain car scenario? I am trying to follow this using the mountain car but it does not work :(
can you please guide me solve the problem I am getting while working on this example TyperError: _set_agent() missing 1 required positional argument: 'agent'
Hi @Nicholas Renotte, I am new to this ...The code works fine however i don't see the graphics of the CartPole as you see in this video. How do i get that as an output ?
I do not know what to say. I think closes words what can describe my feelings now are "wow, that was amazing and very very simple that even I understood what is going on there". Going to play with code and try to solve more problems. I wish I found your channel earlier. 👍🏻
Thanks so much @Alex, you've found it now 😊! I've got way more reinforcement learning and game AI coming in the coming weeks!
nice pace and simple work through, love it man.
Thanks so much 🙏! Got another run of RL tutorials coming up soon!
I usually don't write comments on TH-cam videos but wow! I've watched some of your videos and they are extremely helpful.
The number of views on this video and subscribes on your channel are so underrated thx for the great content and hope u keep making good videos like this one!
Thanks so much for your kind words @M K! Truly appreciate it!
For your content, 6.5k subs are too little. I have been scouring the internet for reinforcement learning courses ever since AlphaGo beat the world champion, and today I found your video. And I'm glad I did.
Yooo, thanks so much! I've got a bunch more RL stuff coming soon!
@@NicholasRenotte Cool!
Seriously, I'm recommending this channel to my data science class
@@freydunthanos3155 yesss, thanks so much!
A Go fan. Didn't expect to see another person of culture here.
Video Summary (Made with HARPA AI):-
00:30 🧠 Core concept: "Area 51" summarizes Action, Reward, Environment, and Agent in reinforcement learning.
01:00 🐍 Python setup: Use OpenAI Gym, TensorFlow, Keras to create and train reinforcement learning models.
04:52 🏞 Gym environment setup: Import dependencies, set up the environment, and extract states and actions.
08:27 🧠 Build deep learning model: Construct a model with TensorFlow and Keras.
13:48 🤖 Train the model: Compile and train with KerasRL, monitor progress.
15:06 🎯 Test the model: Evaluate performance in the Gym environment.
17:30 💾 Save and reload weights: Save and reload model weights.
19:49 🔃 Reuse the model: Rebuild, load weights for further testing or deployment.
The best tutorial on how to start with reinforcement learning that I have ever seen!
tl;dr if you're watching this in 2022, make sure you pip install gym==0.17.1.
I'm sure this is due to the age of this video/updated code being released, but I had the following errors in case anyone else comes across this.
First was - ValueError: too many values to unpack (expected 4) - for the line n_state, reward, done, info = env.step(action). For some reason adding a 5th parameter so it looked like this - n_state, reward, done, info, test = env.step(action) - made it pass.
Next was ValueError: Error when checking input: expected flatten_input to have shape (1, 4) but got array with shape (1, 2) on line dqn.fit(env, nb_steps=50000, visualize=False, verbose=1).
I was able to fix this by downgrading to python 3.8, downgrading protobuf to 3.9.2, and explicitly installing the versions of all traces found in the pip install trace of the jupyter notebook. When I changed the gym version to the one found in the video, it allowed env.step(action) to actually take 4 parameters, instead of the 5th I had to add in to make it pass, and the code ran.
After all that I went back to python 3.10, explicitly installed gym 0.17.0, then installed keras, keras-rl2, and tensorflow, and it worked again.
Thanks for the video, the issues obviously aren't your fault, just wanted to pass this info off. I learned a ton about pip, library versions, and all kinds of other stuff in this process.
This worked for me aswell
i downgraded protobuff which downgraded tensorflow aswell.
After i upgraded tensorflow to the correct version and everything worked.
I think the origin of the problem is not having the correct version of TensorFlow in the first place
TYSM YOU'RE SUCH A G
If you can, try upgrading to gymnasium, a drop-in replacement for gym. gym is no longer maintained.
Commenting for the algorithm. Started looking into deep learning recently and eventually got here, great intro and explanations. Looking forward to the other videos
Thanks so much!! Whatcha working on?
As far as I can tell this tutorial sadly is already outdated since some of the API has changed now and some functions may require different arguements. And updated version of this tutorial would be great!
Thanks for explaining the code, I saw this example online already but with the step by step explanation of this scenario it was much better for learning while running the code alongside the video :)
Heya @Bogdan, thanks so much! I'm building up to more sophisticated examples of RL. I'll be doing a lot more with different environments in the coming months!
Just discovered your course, amazing! Thank you very much. It is still very relevant. Some of the gym environments have newer versions but all still works. Thanks again!
I am so glad I stumbled across your channel. Best tutorial ever! THANK YOU!!!
Thanks sooo much @Raihan!
I watch your videos and feel like you taught us a very important topic like no one did. I do believe this is how no one shouldn't. Better to follow written documentations!!!
Thank you Nicholas.. Your video is very informative, nice pace and entertaining. I am now hooked with RL.
Thank you so much @Tom 🙏
I am from Brazil and your video was very useful for me !!! I hope you to continue to make more videos like that. Great video !!
Glad you liked it @Bruno! Definitely, got a special one on Reinforcement Learning coming up!
Hi Nocholas, you did a great job there, thanks for sharing your knowledge! I would like to mention, that in my case I had a problem running the code, because I got a value error in the line "n_state, reward, done, info = env.step(action)". Adding a fifth value "observation" on the left side (so that it looks like "n_state, reward, done, info, observation = env.step(action)" got the code up and running :-)
Nevertheless your videos are really helpful and please keep going! You're doing an amazing job!
dude I can't import the agents and policies, basically keras doesn't have rl.policy or even rl.agents, what should I do?
# new version with terminated and truncated
episodes = 10
for episode in range(1, episodes+1):
state = env.reset() #initial for each episode
terminated = False
score = 0
while not terminated:
env.render() # render the CartPole
action = random.choice([0,1]) # 0,1 left or right
observation, reward, terminated, truncated ,info = env.step(action)
score+=reward #based on our step we get a reward till it's done
print('Episode:{} Score:{}'.format(episode, score))
Docs
observation (object) - this will be an element of the environment’s observation_space. This may, for instance, be a numpy array containing the positions and velocities of certain objects.
reward (float) - The amount of reward returned as a result of taking the action.
terminated (bool) - whether a terminal state (as defined under the MDP of the task) is reached. In this case further step() calls could return undefined results.
truncated (bool) - whether a truncation condition outside the scope of the MDP is satisfied. Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.
info (dictionary) - info contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. It also can contain information that distinguishes truncation and termination, however this is deprecated in favour of returning two booleans, and will be removed in a future version.
@@FrancescoPalazzo26 I had the same problem. In my case I could solve it by importing the modules from a different path, which is 'tensorflow.python'. so the import commands look Like 'from tensor flow.python.keras.models import sequential'.
Hope this solves your problem!
episodes = 10
for episode in range(1, episodes+1):
state = env.reset()
terminated = False
score = 0
while not terminated:
env.render()
action = random.choice([0,1])
n_state, reward, terminated, truncated, info = env.step(action)
score+=reward
print('Episode:{} Score:{}'.format(episode, score))
run these for new versions
Thank you Nic! Very helpful of you to make such informative videos for all! Wish you lots of success and joy!
Thank you so much @Mohammed!!
Bro's got a taste for classic music. Beethoven and Dvorak in the beginning. Nice!!
Sweet.. love the explanation.. That was a lot to take in but what a clean explanation...
Thanks for the video.
paul.
Superclear! Keep doing your stuff, man!
Thank you! This would be my first motivation to explore RL!
Thanks man! Nice pace and objectiveness.
Thanks so much @Bruno!
Awesome video! I just started my master in AI and seeing your videos helps a lot to remember a couple “key” things before the start of the semester!
I also just started a YT channel, if you’re down we could maybe see how we could create something together, might be fun!
Have a good day 👋🏼
Hey thanks co much @Maxime, glad you enjoyed the video!
Wow! such a wonderful lessons with practical example. I loved it. I want to learn more about self control action mechanism for multivariate industrial control using RL. Kindly put some light on it
Nice, got more RL stuff coming in the weeks coming @Suvankar!
Really nice and simple explanation. Cheers!
YESS! Thanks @Omar, glad you enjoyed it!
Hi Nick .Thanks for your tutorial it really helps me a lot.However , i am getting an error saying :"ValueError: Error when checking input: expected flatten_input to have shape (1, 4) but got array with shape (1, 2)",So ,i am wondering why this error didn't happen in your case
I'm having the same issue
Did you ever find a solution to this issue? I'm having the same problem.
Install the package 'rl-agents==0.1.1'. It works for me.
@@GeraLdario It worked!
Thank you so much ❤
searched a lot for that kind of video and finally found a good one 👏
Thanks sooo much! There's some more reinforcement learning stuff coming this week, hopefully a video on Atari and (assuming my GPU doesn't catch fire) one on CARLA!
@@NicholasRenotte looking forward to seeing that
@@islam6916 awesome stuff!!
Hey! Thanks for the video. I would love to see how I can solve a problem with my own environment. Or how to build a specific environment and an agent with specific actions. I am at the moment not familiar with OpenAI but I think it would be interesting to see something more custom. :)
Heya @WhataDay, check this out: th-cam.com/video/bD6V3rcr_54/w-d-xo.html
Great video. Thanks for explaining everything with the step by step. Excellent Job!
Anytime! A heap more rl videos coming, it's going to be a big focus this year!
Hi Nick,
Thanks for your tutorial, it really helped me kick off the field of RL.
There is an issue of the keras-rl2 package you used, specifically the NAFAgent, which fails all the time even using the example given in the official repo. Could you please spare some time and take a look at it? Many thanks and wish your channel gets better and better!
Best, Tony
When I try to run "dqn.fit(env,nb_steps...)" command I am getting ValueError : Error when checking input : expected flatten_2_input to have shape (1,4) but got array with (1,2)
can you please help me out??
Thank you!!! You rock! Such a well made video! Short and fully informative.
AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy'
when i try dqn.compile, any idea?
i tried copying the code itself but the error continues.
Heya @nn aa, definitely can help out!! Quick one, what version of tensorflow are you using? and how are you importing keras/tf.keras?
@@NicholasRenotte First of all, thanks a lot for your great tutorial videos. i have got the exact same error as @nn aa. I am importing as below. my TensorFlow version is 2.3.1. Could you please take a look into it? Thanks.
Would be great if you create the more RL learnings tutorials with custom environment rather than using OpenAI gym?
@@hninpannphyu8567 anytime! Can you try dropping the tensorflow. from your imports like so:
OLD CODE:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
TEST CODE:
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.optimizers import Adam
@@hninpannphyu8567 I'm actually planning some more RL stuff soon. Anything in particular you'd like to see?
Hey there!
I am having the same issue with *'Sequential' object has no attribute '_compile_time_distribution_strategy'* but in my case *del model* doesn't help at all. If i want to delete it before *model = build_model(states, actions) I receive the error that I want to refer to a var before declaring it (which makes total sense to me xD).
Any ideas how to fix this? :)
btw. this video is amazing! Keep the good work up :)
Heya @Norio! Try deleting it then running the cell that creates the model again. I show it here: th-cam.com/video/hCeJeq8U0lo/w-d-xo.html
@@NicholasRenotte didn't work for me. But thanks for your help :/
I now use Tensorforce and don't have any problems :D
@@Officialnorio awesome work! What did you think of Tensorforce, I checked it out earlier on but switched to stable baselines a little later on!
@@NicholasRenotte Sometimes my code threw some weird output but changing the agent-type fixed it.
Tensorforce is pretty easy to use and does its job (so far) pretty well. I am using Tensorforce for my bachelor thesis about MTSP solutions :D
@@Officialnorio awesome, will need to give it a second chance!
great job man
love from India !
You're a great teacher thanks for making these!!
Thanks so much @Ed, glad you're enjoying them!
short and clear! thanks a lot!
Thanks so much @Tom!
great tutorial! keep making more
Thanks so much @Andy, definitely plenty more coming!
Nice introduction. It seems the DQN method is value-based even you are using BoltzmanQPolicy. BoltzmanQPolicy is like epsilon-greedy, a method to balance exploitation and exploration. Methods like DPG, PPO, A2C, and DDP can be considered as policy-based methods.
Thanks for tuning in @Laha, good to note!
great job, ypu saved me a lot of time. Support from Argentina!!
Hi Nicholas, why did you use "linear" as the activation function in your last layer instead of "softmax"? How would it differ if I choose "softmax" as activation function instead of linear for this case? Will it be possible to mention this, please? Or may be make a video on it? (When to choose linear and softmax activation function for what type of target cases)
softmax is great for classification, but the experiment shown in the video is more of a regression problem. In this case, it makes more sense to use linear. Doesn't mean you can't use softmax, but your dqn will most likely don't work as you would expect it.
@@xnyu254 here you have two states as the output (either go left or go right). It _is_ a classification problem, and not a regression one at all.
Hello, Thanks for the great tutorial step by step video. Quick question. When I run dqn.fit(env, nb_steps = 50000, visualize = False, verbose = 1), I get this error: "'Sequential' object has no attribute '_compile_time_distribution_strategy'". How do I overcome this? and why did this happen?
Thanks again
I checked your other video. Deleting the model and reloading the kernel works. This comment is for anyone with same issues
Awesome work @Sriram, yep that's the solution the majority of the time!
@@randomizer272 thank you for your answer, i have the same problem. But my knowledge is still quite limited so i don't know how i delete my model and reload the kernel. Would be nice if you could explain it a little bit more. Thanks in advance!
@@ts717 You can just do a new line
del model
and create the model again. It worked for me fine. I will attach the video in which he explained about this error.
th-cam.com/video/hCeJeq8U0lo/w-d-xo.html
in the section "def build_model" you have to change the line :
model = Sequential()
into : model = tensorflow.keras.models.Sequential()
i checked and it seems that's because python can misinterpret it with keras, and not tensorflow's keras (but i have no clue why)
this worked for me
If you get "ValueError: Error when checking input: expected flatten_input to have shape (1, 4) but got array with shape (1, 2)", Install the package 'rl-agents==0.1.1'. It works for me.
yoooo thanks mate
Great video! Got it up and running in no time. One question tough: What exactly does the value of 4 out of env.observation_space.shape[0] represent? Isn't the state supposed to be a pixel vector? Or is this some kind of abstraction openAI makes?
Heya @McRookworst, for CartPole we don't use a pixel vector (moreso used in the Atari envs). In CartPole the four values represented in from the observation space are: [position of cart, velocity of cart, angle of pole, rotation rate of pole].
Thank you for such simple and easy to follow video. 🙏
Thanks so much @Chop, glad you enjoyed it!
In 2022, the code might not work well
Instead of :
from tensorflow._api.v1.keras import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
you need to use :
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import Dense, Flatten
from tensorflow.python.keras.optimizers import adam_v2
Thank you!
Great tutorial Nic.. I was trying to implement this and encountered an error when I run the line "dqn.compile(Adam(lr=1e-3),metrics=['mae']
Error: 'Sequential' object has no attribute '_compile_time_distribution_strategy'.
Can someone help me resolving this?
ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow
this error is showing while instaling
ValueError: too many values to unpack (expected 4) can you please help
i have the same exact erro can someone pls help! thanks
you can unpack only the first 4 values from the returned tuple and ignore the rest, so you can use the star: state, action, reward, next_state, *_ = env.step(action)
@@hocgh i already got it thanks a lot!
Hey thanks for the vid! it's great, I get the error AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy' when trying dqn.compile(...) I have the same version of tensorflow as you
Heya @Bartlomiej, delete your model (i.e. del model) and then rerun the code and it should clear it up!
@@NicholasRenotte I also have this same problem. But when you say to delete your model. It is to add the code "env.close ()", because if that didn't work.
Hello! I still don't now why, but I solve this issue writing `del model` a cell below `def build_agent(model, actions):` and then ` model = build_model(states, actions)`. Regards!!
@@vagnermartin4356 nope use 'del model', here'a an example where I do it: th-cam.com/video/hCeJeq8U0lo/w-d-xo.html I think there is conflict between Keras andtf.keras versions perhaps but this seems to resolve the error.
Thanks @@dralbertus!
I keep running into errors because I don't have the right things downloaded. Ive been trying to to fix it for about an hour now and I can't figure it out! If anyone has done this more recent than 2020, and would be willing to help me I would greatly appreciate it. Thanks so much!
Hi, what an amazing video! You are a great teacher and you make the learning of RL fun! However, I have some question and it might be some rookies type of questions because i am not that experience with python. You said that we can reload the trained model, but how can i do it in VSC? Create a new Python file and import the one we created? And also, when i run the " _ = dqn.test(env, nb_episodes=15, visualize=True)" and want to change episodes(just for testing), it has to go through the process all over again, but in your case it just used the rewards already generated and printed it right away. These questions might be so easy that maybe someone in the comments can provide an answer. Thanks :)
Should be able to reload the weights by running this when you open up again: dqn.load_weights('dqn_weights.h5f')
Then to chaneg the number of episodes just change the number set to nb_episodes
e.g.
For 30 episodes run this: _ = dqn.test(env, nb_episodes=30, visualize=True)
For 40 episodes run ghit: _ = dqn.test(env, nb_episodes=40, visualize=True)
If anyone had the same issue as me, using keras-rl saying that model has no attribute __len__, I just modified the model code to:
def build_model(states, actions):
model = Sequential()
model.add(Flatten(input_shape=(1, states)))
model.add(Dense(23, activation='relu'))
model.add(Dense(23, activation='relu'))
model.add(Dense(actions, activation='linear'))
model.__len__=actions
return model
and it worked (notice the additional line model.__len__ = actions
Probably not the best practice, but worked without having to downgrade tensorflow
Thanks so much for helping out the fam @Tom!
It brings tears to my eyes😂😂
Awesome
Yesss! Thanks for checking it out @Varun!
Your video is amazing! You make learning RL fun! However, I have some questions, maybe some rookie-type questions, about the best strategy for reinforcement learning, can I just extract this part, such as a vehicle turning right at an intersection, and then turning left is its best path, Can I extract only this one path among many paths? Or is it possible to convert the results of RL into text? Does this RL training log include the actions selected for these trainings? Please take the time to take a look, thank you very much!
can you please guide me solve the problem I am getting while working on this example
TyperError:
_set_agent() missing 1 required positional argument: 'agent'
Very helpful but could not understand how to visualize the Cart Pole animation. Please let me know how to visualize it
In new versions of gym envs, just add , render_mode="human" to the make function
Hi, thank you for the video.
My question is :
is there any specific reason behind you have installed Tensorflow 2.3.0?
Can version 2.9.0 work without error?
Did ya try it? Try things.
@Nicholas Renotte This is such a well-explained video! Thanks for making it, I was looking for something exactly like this. I wanted to know whether you can make a video on custom environments using different types of observation_spaces and action_spaces (Discrete, Box, Dict, MultiDiscrete). I am trying this for a problem and I'm struggling a bit to understand how to use Dict and MultiDiscrete, most examples use Box and Discrete.
Thanks @Pranav, definitely will do! Got it on the list!
13:43 Which line of code tells the agent to maximize the reward instead of minimize?
Nice content. We're waiting for ML trader series... thank you
Hi Nicholas, Thank you so much for the great content! I'm running into an error "AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy'" I couldn't really find anything online to help me solve it, do you have any idea where this is from? thank you!
Heya @Olivier, try running del model, then rerunning the cell that creates your model.
@@NicholasRenotte it worked for me, but why it acts in such a strange way?
@@adrianchervinchuk5632 I think there is conflict between tensorflow and keras. Seems to happy pretty frequently.
Would you have this notebook working with more recent packages? I tried with tensorflow==2.3.0 but there are issues with numpy and rl imports
Sir can you please help me in doing MountainCar-v0 and frozenLake as well because these not have same properties as cartpole
Amazing video, thanks! 🥳
i am having one error in training part
"""AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy'"""
Heya, try deleting the model and then rerunning the cell e.g. del model then rerun the model creation cell.
@@NicholasRenotte it solved after sometime but thx for reply. 🤗🤗
Thank you so much for this video!
Anytime!
Thank you for the video, would you please make a video about DDPG?
FYI the pygame package needs to be installed for env.render() to work. Took me a little while to figure that one out.
thank you!!!
I am stuck at !pip install keras-rl2
Once I run this line the whole thing gets stuck, and shows "Kernel busy" status. Shows no errors. And I can't run any code after that.
Heya @Jajanzeb, can you try stopping the kernel, then running the pip install at a command line? It might just be hanging.
I tried del model but i am getting an error in step 3 Keras symbolic inputs/outputs do not implement '__len__'.
i researched in stack overflow and the answer was to downgrade to TF 1.14 don't want to do that. Any help greatly appreciated thanks
Heya can you try 2.3.1, that's the version I'm using in the video!
hello sir your video is really helpful ,but i m trying to run the code but at the time of pip install tensorflow it showing the error that ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow
please help
try using a older version of python it worked for me
I have an issue about "AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy'" when I am running dgn.compile. Any ideas?
Heya @Chi Hang Lam, try deleting the model using the code: del model then continue running the rest of the code!
ubuntu 18 - there's no tensorflow 2.3.0 in pip. But keras-rl2 relies on 2.1.0 so I installed that instead.
Hows it going with 2.1.0 Lachlan?
Wooow this is an awesome tutorial
Thanks so much @Ahmed!! 🙏
Question: what is le-2 and le-3 in the part where you build the dqn: target_model_update=le-2???
So sad i got an error "is not defined" and i see that you don't have any answer ...
when I run the test random environment and end of the run code, stuck UI pole showing window and show not responding, what should I do for it?
Try restarting your notebook, migth be locked up.
@@NicholasRenotteThank you. I have tried, but finally added
"env.close()" end of that code and work properly.
I think to check the better library or update. am I correct?
Great video! Quick question: how can I ask the model for the actions he took during the tests? Is there a way of getting a list or an array of all the left/right choices he makes?
I don't think it's available through Keras-RL but if you work with the environment directly you can get it based on the generated action. Want a video on it @Igor?
@@NicholasRenotte I was able to improvise by making him print the action in the env.render(), so I use visualize = True when testing and it returns the action taken in each step. I guess it kinda works. Thanks!
@@igorperessinotto5774 awesome, glad you got a workaround!
Thank you for the amazing content. I would like to know how to define multiple agents? As an example, if we have an environment with more than one agent taking action, how we can define the other agents and return their value?
Heya @Abrikarim, working on something in this space now. Will share it once it's ready :)
@@NicholasRenotte Thank you. Can't wait for it, because am doing my PhD on multi-agent deep reinforcement learning, so am excited for this.
@@abdikarimibrahim7078 woah, awesome space to be working in!
@@NicholasRenotte yeah, thanks alot
great video! thanks for sharing.
Hi,
I am stuck at the line dqn.compile(Adam(lr=1e-3), metrics= ['mae'])
I am getting an error 'Sequential' object has no attribute '_compile_time_distribution_strategy'
Any help would be appreciated, thanks!
Heya @Sarah, can you try deleting the model and reinitialising the variable? It normally clears up the error.
@@NicholasRenotte okay, thanks!
My friend suggested me to make another file and just copy the required methods for the final function, that worked too
@@Sarah-lr6vp awesome work!
@@NicholasRenotte thank you!!
I downgraded colab session to 2.3.1 now the error is "len is not well defined for symbolic tensors (dense_2/BiasAdd:0) Please call 'x.shape' rather than 'len(x); for shape information. I see others in the comments received the same error was anyone able to resolve the issue. Thanks
Heya @Franklg21, what packages are you getting if you run !pip list?
I'm trying to run your notebook in a docker on a Windows host. Any idea how to get it to render the plots in the notebook? For instance, step 1 seems to work, but how do I see the visualization? In the video, you go to another window to display. How does that work?
Thanks for the tutorial!
Heya @John, it looks like it can be done with a virtual display but I haven't tested this out myself. Check this out: towardsdatascience.com/rendering-openai-gym-envs-on-binder-and-google-colab-536f99391cc7
I'm running it directly on a bare metal installation, the visualisation component automatically pops up in a new screen when the code is run. I'm going to look into rendering on a cloud or non-local environment this week as I'm keen to run it non-locally for a few clients.
why is the number of params in each layer more than the (unit of layer n-1 multiply the unit of layer n )? each unit does not have weight
As in why are there more parameters than neurons? You'll have both weights per input feature plus the bias value included in each neron.
Hi Nicholas, thanks for all your greats video.
i've problem with this line of code :
model = build_model(states, actions)
only integer scalar arrays can be converted to a scalar index
and
Error converting shape to a TensorShape: only integer scalar arrays can be converted to a scalar index.
do you have an idea of what can be the issue?
If you sample your states what does it look like? Are they non-integer values?
Hi 👋
Can you make a Q learning agent with just Keras and Tensorflow ?
Creating the agent seems more interesting ⚡
Definitely, I've got the code 80% of the way there! Should be out in the coming weeks!
Hi. Great Video. I am wondering to know if it is possible to create a custom environment?
Sure is! Check this out @Fatemeh: th-cam.com/video/bD6V3rcr_54/w-d-xo.html
Wow, this was very interesting. Great video. I've been interested in trying to use Deep-RL on android games. Do you know how one could go about this? I was thinking of using screenshots as inputs to the DQN. i'd have to create a custom environment, right? Is this something you are familiar with? Thanks again for the video.
Yo @Gustavo Lorentz! Ya, you'd need a custom environment. You could also try out some of the pre-built games from OpenAI though as a kick start, LMK if you want me to make a video on doing it with games! Also, looks like there's a ton of third party envs you could use as a baseline though:github.com/openai/gym/blob/master/docs/environments.md#third-party-environments
@@NicholasRenotte that sounds like a good video, I'd watch it! I'll try building my own custom environment. Thanks for answering :)
@@galorentz anytime, I've got it on the list for some future videos now!
Great video! However, I am getting an error saying "TypeError: only integer scalar arrays can be converted to a scalar index" on the line "model.add(Flatten(input_shape=(1, states)))". How do I solve this?
The variable states is probably not an integer scalar array. are you testing the same eviroment?
Great video!
Can you explain the target model update + the 10_000 displayed by Keras in the verbosity? It seems like, the target model still updates every 10.000 steps, even though the target model update was set to a soft rate of 0.001. What am I missing? :D
Heya @Richchizzl, this is due to a DQN being an off-policy reinforcement learning algorithm. When you train, there are actually two models being trained in tandem, one is consistently being trained based on the latest values (think of this one as a fast learner) the other is being used to generate the next set of states (think of this one as a slow learner). Every 10,000 steps the model weights from the fast learner are copied over the the slow learner, this helps ensure you get more reliable training.
Thank's a lot for your reply! Do you know, if it's possible to change the mark of 10,000 steps within the keras framework? I'm trying to implement the connectx kaggle challenge and it seems like that a slowly increasing reward is completely shaken after the update of the slow learner...
@@richchizzl5020 hmmm, did you try changing that parameter? TBH, I'm now using stable baselines over kerasl-rl as it actually seems, well, a lot more stable. You can set the target model update frequency for a DQN pretty easily using the target_network_update_freq parameter: stable-baselines.readthedocs.io/en/master/modules/dqn.html
I did a bit of a crash course on setting up experiments with it here: th-cam.com/video/nRHjymV2PX8/w-d-xo.html could swap out the algorithm used there for a DQN and set the paramater there.
Nicholas the the !pip list run from Google colab is too long to place in comments are there any modules in that should zero in on? I reset back to TF the downgrade to 2.3.1 threw the len error thanks again
Heya @Frank, qq, what was the error you were receiving again. Lost the original chain. Also, just a heads up it'll be a bit of a pain to visualise the environment in Colab.
I have a question!
how can we see every action and states in each episode ?
this shows just the final score of each episode but we can not see what's happening during episode?
Super video but i have a question. First i tried but i got an error then I copy your code and paste it. But i still got an error.
Error : FailedPreconditionError: Could not find variable dense_5/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Resource localhost/dense_5/kernel/N10tensorflow3VarE does not exist.
I solved it. Use tensorflow.keras instead of keras
I Got This Error When Compiling The Model
AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy'
Plese Help Me !
Thank You
Heya @Sanju, try deleting the model and then rerunning the cell e.g. del model then rerun the model creation cell.
Thank you worked
I got a probleme at the Step 3
3. Build Agent with Keras-RL
the error is : "TypeError: len is not well defined for symbolic Tensors. (dense_2/BiasAdd:0) Please call `x.shape` rather than `len(x)` for shape information."
does anyone have a solution?
Heya @pascalin, can you share your code? Based on the error, replace the line that says len(x) with x.shape?
Same problem for me - happens when you run dqn = build_agent(model, actions)
Heya @@KaranSingh-pr1wu can you share your code, I can test it out on my pc!
I have a little problem running this, i tried in Jupyter same as you but when i clicked in run an error appears saying "GL NOT FOUND" and it happened running the first step of this video, help plz :c
Heya @Jorge, was there a larger error you can share?
@@NicholasRenotte ImportError Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.6/site-packages/gym/envs/classic_control/rendering.py in
24 try:
---> 25 from pyglet.gl import *
26 except ImportError as e:
/srv/conda/envs/notebook/lib/python3.6/site-packages/pyglet/gl/__init__.py in
94
---> 95 from pyglet.gl.lib import GLException
96 from pyglet.gl.gl import *
/srv/conda/envs/notebook/lib/python3.6/site-packages/pyglet/gl/lib.py in
148 else:
--> 149 from pyglet.gl.lib_glx import link_GL, link_GLU, link_GLX
/srv/conda/envs/notebook/lib/python3.6/site-packages/pyglet/gl/lib_glx.py in
44
---> 45 gl_lib = pyglet.lib.load_library('GL')
46 glu_lib = pyglet.lib.load_library('GLU')
/srv/conda/envs/notebook/lib/python3.6/site-packages/pyglet/lib.py in load_library(self, *names, **kwargs)
163
--> 164 raise ImportError('Library "%s" not found.' % names[0])
165
ImportError: Library "GL" not found.
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
in
6
7 while not done:
----> 8 env.render()
9 action = random.choice([0,1,2,3,4,5])
10 n_state, reward, done, info = env.step(action)
/srv/conda/envs/notebook/lib/python3.6/site-packages/gym/core.py in render(self, mode, **kwargs)
238
239 def render(self, mode='human', **kwargs):
--> 240 return self.env.render(mode, **kwargs)
241
242 def close(self):
/srv/conda/envs/notebook/lib/python3.6/site-packages/gym/envs/atari/atari_env.py in render(self, mode)
150 return img
151 elif mode == 'human':
--> 152 from gym.envs.classic_control import rendering
153 if self.viewer is None:
154 self.viewer = rendering.SimpleImageViewer()
/srv/conda/envs/notebook/lib/python3.6/site-packages/gym/envs/classic_control/rendering.py in
30 If you're running on a server, you may need a virtual frame buffer; something like this should work:
31 'xvfb-run -s \"-screen 0 1400x900x24\" python '
---> 32 ''')
33
34 import math
ImportError:
Error occurred while running `from pyglet.gl import *`
HINT: make sure you have OpenGL install. On Ubuntu, you can run 'apt-get install python-opengl'.
If you're running on a server, you may need a virtual frame buffer; something like this should work:
'xvfb-run -s "-screen 0 1400x900x24" python '
Hello, would you be able to make a video on this but using the mountain car scenario? I am trying to follow this using the mountain car but it does not work :(
I like you, your video and your teach. keep go on
can you please guide me solve the problem I am getting while working on this example
TyperError:
_set_agent() missing 1 required positional argument: 'agent'
Hi @Nicholas Renotte, I am new to this ...The code works fine however i don't see the graphics of the CartPole as you see in this video. How do i get that as an output ?
What about other gym environments? Does it work with Mountain Car, Lunar Lander, etc. ?
Yup! I'm actually using a different library now called stable baselines. This is with Lunar Lander: th-cam.com/video/nRHjymV2PX8/w-d-xo.html
Thanks :) I will take a look
@@martinatoshevska1761 awesome! Let me know how you go!
@@NicholasRenotte I was able to implement DQN agent for Lunar Lander with both Keras-RL and Stable Baselines. Thanks again :)
@@martinatoshevska1761 yesss, awesome work!
When hitting line:
env.render()
it says: Cannot connect to "None"
Running in Colab? Might need to try on desktop.
@@NicholasRenotte yes in Collab.I Thanks i will try!