Building a Custom Environment for Deep Reinforcement Learning with OpenAI Gym and Python

Nicholas Renotte

มุมมอง 144 913

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 3 ธ.ค. 2024

ความคิดเห็น • 310

@laser7861 2 ปีที่แล้ว ⁺⁹
Great tutorial. Simple and to the point, especially for someone who is familiar with RL concepts and just wants to get the nuts and bolts of an OpenAI gym env.
@tawsifkamal88 3 ปีที่แล้ว ⁺⁹
Really informative video! As a high schooler self-learning RL, tutorials such as these are really helpful for showing applicability in RL.
@NicholasRenotte 3 ปีที่แล้ว ⁺²
Ohhh man, going to have something awesome for you in a few days time then!
@user___01 3 ปีที่แล้ว ⁺²
Man you can't stop giving us this gold of a tutorial!
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Definitely! Two a week that's the goal man!
@Paul_Jeong96 3 ปีที่แล้ว ⁺¹⁷
Thank you for your tutorial, I hope to see how you can visualize the environment in the upcoming tutorial!
@NicholasRenotte 3 ปีที่แล้ว ⁺⁹
Me too! Keen to do a ton more stuff with RL and possibly PyGame!
@NicholasRenotte 3 ปีที่แล้ว
@Eliseo Raylan awesome! Let me know how you go with it!
@tomtalkscars9494 3 ปีที่แล้ว ⁺¹¹
Love these. Building custom environments is one of the biggest areas missing with the OpenAI stuff imo.
Would be cool to see one bringing in external data. Like predicting the direction of the next step of a Sine Wave or something simple like that.
@NicholasRenotte 3 ปีที่แล้ว ⁺²
Definitely, got way more stuff on RL planned once the Python course is out!
@Techyisle 3 ปีที่แล้ว ⁺¹²
Your tutorials were awesome, and I just finished your 3-hour RL tutorial, and I would like to see a Pygame implementation as soon as possible :)
If possible, try to create a different set of advanced videos where you will explain the math and intuition behind RL, along with code implementations (to cater a different audience).
Something I like about you is that you respond to each every comment, a characteristic which I don't see often from others. Kudos to you!
Thanks again mate! Stay safe!
@NicholasRenotte 3 ปีที่แล้ว ⁺⁴
Thanks @Techy Isle, I'm definitely going to be going into more detail. Been studying some hardcore DL stuff like crazy while producing the Python basics course!
@benjiusofficial ปีที่แล้ว
This is way more useful than the last one. The more you can modify OpenAI's envs, it seems, the more that you can get out of the reinforcement learning schema.
@MuazRazaq 3 ปีที่แล้ว ⁺¹
I Can't stop myself from commenting on this exceptionally good tuotorial.
Sir, really amazing job. I must say you should continue this good work, the way you explain each and every line is something that is very rare in the material that is available till now.
Much love from a Pakistani student currently in South Korea 😍
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Ohhhh thanks so much @Muaz! Soo glad you enjoyed it.
@prakhars962 3 ปีที่แล้ว ⁺¹
just recommended this video to one of my coursemate. your videos are worth sharing.
@NicholasRenotte 3 ปีที่แล้ว
Thanks soo much!
@yxzhou5402 2 ปีที่แล้ว ⁺¹
As a beginner of RL, all your videos really help me a lot so thank u!!! And I just wonder if there is any chance to see the tutorial on how to build the env with multi-dim action?
@charlesewing9772 2 ปีที่แล้ว ⁺⁴
Hi, great video! I was just wondering what happens if say for example the temperature is at 100 and the model try's to add 1 to temperature (so now outside the limits), does it then resample automatically or would you have to implement this in the code yourself?
@SatoBois 3 ปีที่แล้ว ⁺⁷
Hello Nick! I love your tutorial and it's actually helping so much in university especially consider the lack of documentation for openai. I was actually doing a custom environment for tictactoe to practice but for some reason when I run dqn.fit() like you did with the same everything for the keras-rl training part I get this:
"ValueError: Error when checking input: expected dense_16_input to have 2 dimensions, but got array with shape (1, 1, 3, 3)"
I don't quite understand why it got that shape because my tictactoe game's observation space is a np.array([Discrete(3)]*9) to represent the nine tiles and the three possibilites of what could be in them.
Again, thank you for the helpful tutorials!
@myceliumbrick1409 ปีที่แล้ว
yep i have the same error. Did you manage to solve the issue?
@oliverprislan3940 2 ปีที่แล้ว
Thank you Nicholas,
this is a very good example to give it a kick start.
@pratyushpatnaik4617 3 ปีที่แล้ว ⁺³¹
Sir that was exceptionally good!!! 🔥
I would really love to see the render function in play using pygame.
Waiting eagerly for it!!!!
@NicholasRenotte 3 ปีที่แล้ว ⁺⁹
Definitely, can't wait to finally do something with Pygame!
@viswanathansankar3789 3 ปีที่แล้ว ⁺²
@@NicholasRenotte Yeah Please do it as soon as possible...
@albertsalgueda1036 2 ปีที่แล้ว
Yes! there is a need for Environment viz.
@markusbuchholz3518 3 ปีที่แล้ว ⁺¹
Nicholas as I mention sometimes ago your YT channel is outstanding and your effort impressive. The RL is my favourite branch of ML so I extra enjoyed watch your performance. Exceptionally, you built also customised environment. The idea can be easily populated and applied to other specific tasks. It is a great pleasure to watch your channel and I will recommend everyone to be here (to subscribe) Have a nice day!.
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Thank you so much @Markus! Glad you enjoyed the RL videos, I think it's a super interesting field with a ton of interesting applications. I'm hoping later on this year we might be able to apply some of it into hardware applications with Raspberry Pi or ROS!
@markusbuchholz3518 3 ปีที่แล้ว
@@NicholasRenotte Thank you wonderful feedback! Yes ROS/ROS2 is great robotics framework. Now I am more inspired by Nvidia Jetson Xavier since it is "slightly" more powerful. Good luck!!!
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
@@markusbuchholz3518 oooh yeah, I took at that yesterday. Looks awesome! The OAK camera looks promising as well!
@stevecoxiscool 2 ปีที่แล้ว ⁺¹
I guess what urks me the most about all the universe/retro/baselines gym examples is that it's not straight forward to get your bright/shiny newly trained model to run in other environments. These gym examples have so many interdependencies and one does not really know what is going on inside the box. This is why I am glad you are doing the video on getting other environments to work with RL algos. Unreal is my choice sine Unity already has a ML examples.
@NicholasRenotte 2 ปีที่แล้ว
100% I took a look into the Unity environment over the Christmas break and was godsmacked. Well documented, logging and training was clear. I love OpenAI Gym but it seriously Unity ML agents appear to be so much easier to deal with.
@stevecoxiscool 2 ปีที่แล้ว
@@NicholasRenotte I really wish Unreal was at par with Unity on the ML technology. I am using UnrealPythonPlugin to send images to a remote python client running opencv DNN. The video doing this on my youtube is a few years old. Your custom gym environment linked to Unreal is doable. Thanks for your videos !!!!
@Oriol.FernandezPena 3 ปีที่แล้ว ⁺⁶
Your content is the best!! 🔥🔥
@NicholasRenotte 3 ปีที่แล้ว
Thanks so much!!! 🙏 🙏
@kushangpatel983 3 ปีที่แล้ว ⁺¹
Really useful tutorial, Nick! Keep it up, mate!
@NicholasRenotte 3 ปีที่แล้ว
Thanks @Kushang!
@DreamRobotics 3 ปีที่แล้ว ⁺¹
Very simple and very nice. Good work.
@NicholasRenotte 3 ปีที่แล้ว
Thanks so much @Dr. Abdul-Mannan Khan!
@ameerazam3269 3 ปีที่แล้ว ⁺¹
Again Best ever explanation Sir appreciate your work keep it up for us
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Thanks so much @Ameer!
@TheNativeTwo ปีที่แล้ว
Great video, I like how you explain each line of code. My one complaint is not your fault... Getting the right environment and versions of the packages. I got right to the end... And couldn't get it working. A bit frustrating lol.
@jeibros-c3p 9 หลายเดือนก่อน
Thanks a lot for the clarity of explanation.
@islam6916 3 ปีที่แล้ว ⁺³
Thank you for the video ⚡⚡⚡
I hope you can make a Custom Agent Next time ✅
Looking forward to see that ✨
@NicholasRenotte 3 ปีที่แล้ว ⁺²
Heya! Definitely, code is 80% of the way there, I should have it up in the coming weeks!
@islam6916 3 ปีที่แล้ว
@@NicholasRenotte Great !!!
@Spruhawahane 2 ปีที่แล้ว ⁺¹
On my mac the kernel keeps dying when I run the basic cartpole example. Don't know how to troubleshoot. Pls help.
@EthanWicko 17 วันที่ผ่านมา
Try doing it in a notebook
@samuelebolotta8007 3 ปีที่แล้ว ⁺²
Hi Nicholas, great great work! It would be interesting to see a parallel with ML agents from Unity, to see the differences with OpenAI Gym. Thanks!
@NicholasRenotte 3 ปีที่แล้ว
YESS! I've been waiting for someone to ask for it, I've started testing it out already, should have a tutorial on it kinda soonish!
@aayusheegupta 2 ปีที่แล้ว ⁺¹
Hello Nicholas! Great tutorial on building customized environment with Gym. Could you please share any pointers on how to load our own dataset while building an environment? I want to load and train RL agent with natural language sentence embeddings and create a proof tree.
@ProfSoft 3 ปีที่แล้ว ⁺¹
great job , thanks
i have a question , why in the model building you put the last layer activaition function to 'linear' , i think we should make it softmax because i think it is classification problem ??
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Hmmm, could definitely change the activation function there!
@ProfSoft 3 ปีที่แล้ว
@@NicholasRenotte
Thank you very much, I just wanted to make sure there was no specific reason for choosing linear Activaotion function, God bless you for this great effort
@RafalSwiatkowski 3 ปีที่แล้ว ⁺⁴
Greetings from Poland. Extra tutorial, it will be great if you show how to combine pygame with reinforcement learning
@NicholasRenotte 3 ปีที่แล้ว ⁺²
Woah Poland, what's happening! Definitely, I'll get cracking on it. Much love from Sydney!
@RafalSwiatkowski 3 ปีที่แล้ว ⁺¹
@@NicholasRenotte Thank u master ;)
@NicholasRenotte 3 ปีที่แล้ว
@@RafalSwiatkowski anytime!! 🙏
@khaileng3020 2 ปีที่แล้ว ⁺²
to fix the Sequential error: just arrange the order of importing library
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory
states = env.observation_space.shape
actions = env.action_space.n
print(actions)
def build_model(states, actions):
model = tf.keras.models.Sequential()
model.add(Dense(24, activation='relu', input_shape=states))
model.add(Dense(24, activation='relu'))
model.add(Dense(actions, activation='linear'))
return model
model = build_model(states, actions)
print(model.summary())
def build_agent(model, actions):
policy = BoltzmannQPolicy()
memory = SequentialMemory(limit=50000, window_length=1)
dqn = DQNAgent(model=model, memory=memory, policy=policy,
nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
return dqn
dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)
@traze78 ปีที่แล้ว
thanksssssss!! it helped a lot
@AldorCrap ปีที่แล้ว
very useful tutorial. Although I would need some help with mine. I'm working to train a model for optimal path routing. I'm struggling on defining the observation_space, like does it have to be the whole road graph (as a spaces.Graph) or Box with the parameters (like current_coords, dest_coords, edges_max_car_speed) or maybe both? how do I should approach this?
@travelthetropics6190 3 ปีที่แล้ว ⁺¹
Thanks for the informative series on reinforcement learning. Are you running this on CPU or GPU ? At [23:23]. I have noticed that in your PC, it is like 47-55 sec per 10000 steps. I am getting 118-120 sec with my GPU and 59-63 sec with my CPU only. It seems like, this small model works better with CPU only, may be due to the extensive copying time to GPU :D
@NicholasRenotte 3 ปีที่แล้ว
Yeah, I noticed that as well, with RL oftentimes the model won't benefit as much from GPU acceleration.
@jugalyadav7110 2 ปีที่แล้ว
Hello Nicholas, firstly this video helped me a lot to get my basics cleared up regarding RL.
Currently I am working with my own custom env and building a SAC model over it. I wanted to plot the actor and critic losses, and from your video I get that it should be done within the render function. It would be great if you could post some video summarizing the plots in render function.
Cheers !
@nilau8463 2 ปีที่แล้ว
Thank you for the guide, really gave me a good idea how to implement my own models!
@padisalashanthan98 ปีที่แล้ว
Great video! I am a little confused on how to solve a multi-states problem. Can you please give some pointers on that?
@frankkreher4832 2 ปีที่แล้ว
Thank you, once again, for the very educational video.
Great work!
@idrisbima5369 2 ปีที่แล้ว ⁺¹
Hello Nick, wonderful video. I am having the same error message you pointed out in the video and tried resolving it as shown but it is giving me a different error message stating the name model is not defined. Please help
@jiajun898 3 ปีที่แล้ว ⁺¹
Great tutorial. A question though. What would be the benefit of transferring your reinforcement learning from the keras implementation to the openai gym environment implementation?
@NicholasRenotte 3 ปีที่แล้ว
This is Gym, it's more the rl agents that I've started migrating (better stability, control and exporting).
@sandeepagarwal8566 3 ปีที่แล้ว
Thank you for the tutorial.Like sklearn has hyperparameter tuning,Please let us know how can we tune hyperparameters in case of DQN...like any package or library or any kind of reference would be helpful...Thanks
@OmarAlolayan 3 ปีที่แล้ว ⁺¹
Thank you Nicholas !
Can you please advice me on how to use the step function if I have a mutlidiscrete action space?
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Heya @Omar, what does your output look like if you run env.action_space.sample()
@OmarAlolayan 3 ปีที่แล้ว ⁺¹
@@NicholasRenotte Hi Nicholas, Thank you for your reply
it is big action space it is a 3D action space, (3, 10, 10).
array([[[2, 0, 2, 2, 1, 1, 0, 0, 1, 1],
[1, 1, 2, 2, 2, 0, 1, 0, 2, 0],
[1, 2, 0, 1, 0, 2, 0, 1, 1, 1],
[1, 0, 1, 0, 0, 1, 2, 0, 1, 1],
[1, 2, 0, 2, 2, 0, 1, 0, 0, 2],
[2, 2, 0, 2, 0, 1, 1, 0, 2, 2],
[2, 0, 1, 1, 0, 0, 1, 1, 1, 1],
[0, 2, 2, 2, 2, 1, 0, 0, 0, 2],
[1, 2, 2, 0, 1, 1, 1, 2, 2, 2],
[2, 0, 0, 1, 1, 2, 1, 1, 0, 2]],
[[1, 2, 1, 0, 1, 1, 1, 2, 0, 1],
[0, 1, 0, 0, 1, 1, 2, 2, 1, 2],
[0, 2, 1, 0, 2, 1, 2, 2, 2, 1],
[1, 2, 2, 0, 0, 2, 0, 2, 2, 0],
[0, 2, 0, 0, 0, 0, 1, 2, 1, 2],
[1, 2, 1, 1, 1, 2, 0, 1, 2, 1],
[1, 1, 1, 2, 2, 1, 2, 0, 0, 2],
[2, 1, 0, 1, 1, 2, 0, 0, 0, 2],
[0, 0, 1, 1, 1, 0, 1, 2, 2, 1],
[2, 0, 2, 1, 1, 0, 0, 2, 1, 0]],
[[2, 1, 1, 2, 1, 1, 2, 1, 0, 2],
[0, 1, 2, 1, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 2, 1, 2, 0, 1],
[2, 1, 0, 0, 0, 1, 2, 0, 1, 2],
[2, 0, 2, 1, 0, 0, 2, 0, 2, 1],
[0, 1, 0, 1, 1, 0, 2, 0, 0, 2],
[1, 2, 0, 1, 0, 2, 2, 2, 2, 0],
[0, 0, 0, 1, 1, 2, 2, 2, 0, 0],
[1, 2, 2, 2, 1, 0, 2, 0, 1, 1],
[2, 1, 0, 0, 1, 0, 2, 1, 2, 1]]])
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
@@OmarAlolayan oh wow, can you try using this with stable-baselines instead? It might be easier to model as the algorithm will pick up the observation space without the need to define the neural network.
@mzadeh 3 ปีที่แล้ว ⁺¹
Thank you very much, very clear and clean.
@NicholasRenotte 3 ปีที่แล้ว
Thank you so much @Mostafa!
@saaddurrani8930 3 ปีที่แล้ว ⁺¹
i am doing a project: RL for smart car (prototype )by using DQN or any other RL algorithm.
So i am thinking to feed in images as a state (from the camera mounted on the car) and my car is able to take 3 actions (forward, right and left).. I am keeping it quiet simple i.e by keeping the car in front of our goal, and as the car sees the goal i want to reward it and take the next action , now if it takes sucha random action where the goal is no more in the vision of the camera, it gets a penalty (state,action,reward/panelty, next state and so on). The episode time is limited to 2 mins.My aim is that the car moves towards it goal (and the more it moves towards the goal the more the size of that feature would be larger, and hence it will get another reward bcz its moving towards its goal) (goal would be an image "Triangle" at the end of the room infront of the car intial position. Now before implementing my DQN into the real life prototype i need to train it on open AI gym (3d). I have no idea how i can build such a environment where i can train my DQN RL by simulation. any help and suggestion are apreciated
@NicholasRenotte 3 ปีที่แล้ว
Take a look at how some of the video game driving environments are built! Should be a good start for how to kick it off!
@melikad2768 2 ปีที่แล้ว
Hi Nick. Thank you veryy much, I learned a lot. But there is a question: How can see which action should the shower can take? I mean, how can i understand about the action which the agent can take based on the reward?
@TheOfficialArcVortex 2 ปีที่แล้ว
Any chance you could do a tutorial on how to use this for physical computing? For example, how would you implement two led's say using GPIO on a raspberry pi when temp goes up or down and an input sensor for temperature. Or say an accelerometer and a motor for balancing.
@ihebbibani7122 3 ปีที่แล้ว ⁺¹
As usual , excellent content. Thank you so much :)
@NicholasRenotte 3 ปีที่แล้ว
Thanks so much @Iheb!
@hariprasad1168 3 ปีที่แล้ว ⁺¹
Thanks! This is really helping me a lot.
@mohamadalifahim344 3 ปีที่แล้ว ⁺¹
Overly tensor flow for baseline import A2C not work.
How to solve
@NuHoaNgonLa 3 ปีที่แล้ว ⁺¹
what does the Env argument inside the ShowerEnv() class do?
@NicholasRenotte 3 ปีที่แล้ว
Should be the parent class, I may have forgotten to run super().__init__() inside of the __init__ function.
@erfankhordad9403 3 ปีที่แล้ว ⁺¹
Thanks Nicholas for the great explanation. I have tested this custom environment with PPO and MlpPolicy and got very low rewards around -40 (even with 200000 time steps for model.learn). Any idea why I get poor results? thanks
@NicholasRenotte 3 ปีที่แล้ว
Same env as this one or custom one? Might need a little HPO or possibly an alternate algorithm, I think I did it with a slightly different model in the full RL course with better results!
@KEFASYUNANA ปีที่แล้ว
Great Videos. Any idea on how to handle 2 or 3 states/observations in the codes
say temperature and pressure or humidity
@kayleechu931 2 ปีที่แล้ว
Hi Nicholas, thanks a lot for your video! I wonder how can I know about the objective function of the agent? Is there a way that I can change the objective function myself? Thanks a lot!
@bananabatsy3708 3 ปีที่แล้ว ⁺¹
I am just getting a NotImplemented Error in the FOR LOOP cell. I looked up. It has to do with inheritance. But I cannot find how.
@NicholasRenotte 3 ปีที่แล้ว
Heya @BananaBatsy, where about's is the error being triggered?
@svh02 3 ปีที่แล้ว ⁺¹
hey @Nicholas, awesome as usual !!
Any reason why you chose to build your agent with Keras-RL and not with the ones provided by Stable-Baselines?
Hope you keep making videos about custom environmets. I think that's what's most useful.
TH-cam is already crowded with videos about the common environmets for games and stuff like that.
@NicholasRenotte 3 ปีที่แล้ว
Was a little early on when I did this, I've since transitioned most of my rl projects to sb! Got plenty planned on custom environments, stay tuned!
@SuperHockeygirl98 ปีที่แล้ว
Hey , thank you so much for this video. It really helped me. I have a question: can you define your observation space using CSV files and then iterate over it, so the agent needs to deal with differing environments?
@emanuelepapucci59 2 ปีที่แล้ว
Hi Nicholas, I'm following your tutorial, but I don't know why, at the end of the tutorial when I start: dqn.fit(env, nb_steps=50000, visualize=False, verbose=1) I get this following error: ValueError: Error when checking input: expected dense_76_input to have 2 dimensions, but got array with shape (1, 1, 1). I don't really know how to solve and I'm following just your tutorial... do you have an Idea?
In case, thank you for your time!
@montraydavis 2 ปีที่แล้ว
Fantastic tutorial!
I have a question though regarding the DQNAgent test.
I noticed that at the test function, the only two actions being sent to step are the low and high values. Why is that?
How would I go about this because I need action to be equal to 0, 1 or 2 for my application.
Thanks a lot for this resource!
@fidelesteves6393 3 ปีที่แล้ว ⁺¹
What an amazing tutorial! Thanks
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Thanks so much @Fidel!
@andreamaiellaro6581 ปีที่แล้ว
Hi Nicholas, I found this tutorial of great help!Thank you. However I'd like to ask you if what I have in mind is correct or not: within the step function can I update what side the observation_state?
@davidowusu1184 3 ปีที่แล้ว ⁺²
Great Video. I was able to use this as a basis to create an environment for my specific needs.
I have one question though
Once you've trained your model and saved your weights, how do you use it? I mean actually pass values to the model to get an action as a response
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
You can pass the new state to the model and actions are returned as the output. Can then go and plug it into the real model/iot suite etc.
@davidowusu1184 3 ปีที่แล้ว
@@NicholasRenotte Thanks so much for the wonderful content and thanks even more these replies. You're awesome.
@zitongstudio 3 ปีที่แล้ว ⁺¹
Hi, I don't understand why during training the reward is around -0.5, while during testing the reward is around -60. Is it because that the number of steps used for training and testing are different? For training it is over 10000 steps, for testing only 60 steps.
@NicholasRenotte 3 ปีที่แล้ว
Different starting points without enough steps to get to the final result. Increasing testing steps would allow the agent to iterate closer.
@vincentroye 3 ปีที่แล้ว ⁺¹
Excellent tutorial, thanks! Is it possible for a RL model to output a pair of ints or floats ( like [1.5, 2.8] ) instead of a discrete value? What would the output layer look like?
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
I believe so, you would need your final layers to have a linear activation function and need a box state space! What's the use case if you don't mind me asking @Vincent?
@vincentroye 3 ปีที่แล้ว
@@NicholasRenotte thanks for answering. I'd be interested to see how a model could output the best geometrical coordinates for a given state. It could be a 2D game where the player would have to avoid bombs that pseudo-randomly hit a finite surface for example.
@NicholasRenotte 3 ปีที่แล้ว
@@vincentroye oh got it! Might I suggest you approach it slightly differently. You would ideally store the state of the objects coordinates and just output the actions for your agent in response to those coordinates. It would be akin to your agent walking around using something like sonar.
@vincentroye 3 ปีที่แล้ว
@@NicholasRenotte could the actions in that case be to move x (left or right) and y (up or down) at the same time? That would be the reason for having 2 outputs. I'd be interested to see how a dqn agent would train the model in that case. In your video it takes a discrete value as nb_actions, how would that be done with 2 continuous outputs? that's where I'm a bit confused, that would give place to a huge amount of possible actions.
@ehrotraabhishekm4824 ปีที่แล้ว
Fantastic video, thank you so much for it....i have one doubt regarding DQN in gym....can you please share some details for how to proceed DQN with multi-dimension state space(4 D) which was 1 D in your case (temp)
@Sam-iy1kv ปีที่แล้ว
Hi, very nice video ! May I ask one question, what if I need a continuous model for the training task, the discrete action will not feasible , how can I do ?
@sommojames 2 ปีที่แล้ว
Great video, but what's the point with observation space? Looks like your agent is not using it
@alirezaghavidel4594 3 ปีที่แล้ว ⁺¹
Thank you for your amazing work. I have a question regarding the defined Environment. I defined the self.state as vector in the __int__ function (self.state =np.zeros(shape=(5,),dtype = np.int64)), but when I want to recall the self.state in Step function, it is an integer. How can I have the vector state in the Step function as well?
@NicholasRenotte 3 ปีที่แล้ว
I just took a look at my code and I think it's not perfect tbh. Try setting initial state inside of the reset method.
@alirezaghavidel4594 3 ปีที่แล้ว
@@NicholasRenotte Thank you. I did
@alirezaghavidel4594 3 ปีที่แล้ว
@@NicholasRenotte I have another question and I would be appreciated if you help me. I defined the environment for multi-component example and I defined the action space and observation space as vectors and now I want to recall them for RL in keras (you used states=env.observation_space.shape and actions = env.action_space.n for input parameters of built_model function). How can I recall them for my multi component example? do you have any example for multi-component example for RL in keras? Thank you
@lukejames3570 3 ปีที่แล้ว
two quesitons, Is the aciton came out of the network go to next step of the env? and how can you be sure the output of the network is 0 1 2 instead of other random number?
@vigneshpadmanabhan 2 ปีที่แล้ว
Is there a Deep Reinforcement learning algorithm we can experiment on regression based tabular data or sensor data etc? If so it would be much appreciated if you could make a video on it. Thanks !
@zahrarezazadeh293 2 ปีที่แล้ว ⁺¹
Thanks Nicholas for the nice tutorial! I have two questions. 1. I'm trying to implement this on PyCharm with Python 3.10, on a MacOS Monterey, Core i3, but with built-in Python 2 something. I can't install and import tensorflow. It says it can't find a satisfying version. Any idea where the problem comes from? different Python versions? any solutions? 2. I'm starting to build my own environment, which is not like any of the ones available. It's a 2D path an agent should try to stay close to, by going left and right, with some gravity. Any suggestions where to start coding it? or any environments you know similar to this? THANK YOU!
@NicholasRenotte 2 ปีที่แล้ว ⁺¹
Woah fair few there, take a look at some of the existing Gym envs, I think there might be some path focused ones I might have seen a while ago.
@raihankhanphotography6041 3 ปีที่แล้ว ⁺¹
Thank you for the tutorial. This was super helpful!
@NicholasRenotte 3 ปีที่แล้ว
Anytime @RaihanKhan!
@Antonio-om4sg 3 ปีที่แล้ว
Would it be possible to see the evolution (a plot) of the temperature of the water when the agent is run on the scenario? For each episode we would see, for each step, the evolution of the water temperature
@PedroAcacio1000 3 ปีที่แล้ว
Great video! Thank you very much, sir!
What about if my problem only allow me to determine the reward on the next step, after we took some action? Do you have any video talking about such problems?
Thanks again, your content is helping a lot.
@sebatinoco 3 ปีที่แล้ว ⁺¹
Hi Nicholas, amazing video! Quick question, how can I access the current action that the AI is taking? Thanks!
@NicholasRenotte 3 ปีที่แล้ว
Heya @Sebastian, I don't believe it's easily accessible through keras-rl. If you're using StableBaselines, you can access it through DQN.predict(obs) e.g. github.com/nicknochnack/StableBaselinesRL/blob/main/Stable%20Baselines%20Tutorial.ipynb shown towards the end.
@boonkhao 9 หลายเดือนก่อน
Hi, it is a great video tutorial for customizing environment. However, I copy your code and run on jupyter notebook, I stuck in a problem that rl.agents could not find the version of keras. I have tried many ways to solve this but I am still cannot solve it. So, please help me.
@davidromens9541 3 ปีที่แล้ว ⁺¹
Have the libraries for kera rl or keras-rl2 been updated recently? I have been building a custom environment and training a NAF agent to solve it. Last week it was working, but when I came back this week, the code is throwing a key error:0 when my NAF.fit line is run. Any suggestions or help would be greatly appreciated.
@davidromens9541 3 ปีที่แล้ว
FYI I am using google colab which requires me to reinstall all libraries every session. I know its not ideal, but unfortunately I am on Windows.
@NicholasRenotte 3 ปีที่แล้ว
Heya @David, not too sure I haven't been using keras-rl2 lately, I've been working with stable baselines in it's place. Did you have errors that you can share?
@davidromens9541 3 ปีที่แล้ว
@@NicholasRenotte Turned out to be a weird bug with google colab. Restarted my computer and is working fine now. Thanks for the reply though!
@NicholasRenotte 3 ปีที่แล้ว
@@davidromens9541 anytime, you're welcome. Weird though. Building anything interesting in the RL space?
@PhilippWillms 2 ปีที่แล้ว ⁺¹
Where does the 24 come from in defining the neural network layers?
@NicholasRenotte 2 ปีที่แล้ว ⁺¹
Completely subjective, could change it to a larger or smaller value depending on the complexity of the problem you're trying to solve Philipp!
@candychebet896 ปีที่แล้ว
Hello. Did you get to doing the visualization?
@anamericanprofessor 2 ปีที่แล้ว
Any good links to actually overriding the render function for showing our own custom visualization?
@sahilahammed7386 2 ปีที่แล้ว
hi, do the Observation space and state are same? here observation space isn't used while model training. right?
@master231090 3 ปีที่แล้ว
Amazing video! I had a question about the activation layer and performing the final action. Your activation layer is a linear function. How does link to picking the action?
@monirimmi8616 2 ปีที่แล้ว ⁺¹
Hi Nick,
Thank you very much for your nice explanation with outstanding implementation. To this end, I have a question,
How can I check the model parameter update, for example, the weight of each layer? When each training episode is done? Is there any way to check those parameters?
@NicholasRenotte 2 ปีที่แล้ว
I think you can export the final keras model, this should allow you to see the model weights etc
@jossgm7480 2 ปีที่แล้ว
How can I access the Q-Values? I see that in the training the "mean_q" variable is displayed. How can I access these values (mean q values)?
@vincentroye 3 ปีที่แล้ว ⁺¹
Could you extend and complexify this tutorial using a dictionary of boxes as observation space please? I can't find any tutorial for the creation of more advanced environments and training with Keras.
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Yup, I'm going to do a deeper dive into building environments. Got it planned @Vincent, ideally it'll be a 3 hour free short course on TH-cam.
@vincentroye 3 ปีที่แล้ว ⁺¹
@@NicholasRenotte thanks a lot. The part that I don't find easy is the design of the rl model when the observation space is complicated. I tried a few things and I often get problems with input/output shapes and non allowed operations between lists and dicts. My current alternative is to look at the open ai solved environments but I haven't find any working code nor environment for an observation space built with a dict.
@vincentroye 3 ปีที่แล้ว
I just found that gist.github.com/bklebel/e3bd43ce228a53d27de119c639ac61ee but they don't work for me.
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
@@vincentroye agreed, there isn't a lot out there on building environments! I'm wrapping up my Object Detection series then will start on a deep dive into RL including environment building!
@fernandomelo8460 3 ปีที่แล้ว ⁺¹
First, your channel is amazing.
Second, i try to adapt your custom env to a trading env, but, when i use Deep Learning/Keral RL2 it doesnt looks good, my Reward is always the same (and the maximum).
I think the problem is the NN architecture or/and Rl models (BoltzmannQPolicy/DQNAgent), because the loop before Deep Learning part looks ok, do you have any tips?
@NicholasRenotte 3 ปีที่แล้ว
Check this out: th-cam.com/video/D9sU1hLT0QY/w-d-xo.html
@christiansiemering8129 3 ปีที่แล้ว ⁺¹
Hi Nicholas, thx for your great video! Is there an easy way to create customized multi-agent environments with aigym? I want to create an AI that competes against another agent in a Multiplayer-"game".
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
AFAIK it's not super straightforward with pre-built RL packages that are out there atm. Will probably get to it in future vids!
@dhiyamdumur6245 2 ปีที่แล้ว ⁺¹
Hi Nicholas!
Very informative video! I would like to know if we can implement DDPG in the context of routing in a simulated networking environment to assess its performance in terms of network delay. Thank you
@NicholasRenotte 2 ปีที่แล้ว
Yeah probably! I would think you would have different routes or paths for network load then reward based on latency or something of the like!
@Mesenqe 2 ปีที่แล้ว
Nice tutorial, could you please make one tutorial on how to use RL in image classification.
@juanguitarte5480 3 ปีที่แล้ว ⁺¹
How do I custom my environment to make it visual? Using pygame? or how?
Thank you
@NicholasRenotte 3 ปีที่แล้ว
Yep, can definitely do it with PyGame!
@kheangngov8005 2 ปีที่แล้ว
Hello Nicholas, Ur video is very helpful. I have some questions to ask. I wonder if it is possible to customize actions space for each state and reward only given at the terminal state. For example state 1, with 3 actions, state 2 with 5 actions, 3 with 10 actions, and reward can be calculated based on those action sequence whether it is a win or lose. Thank you.
@vts_22 2 ปีที่แล้ว ⁺¹
Hi Nicholas,
I watched all of the Reinforcemenet videos of yours and on the internet.(And i have been trying for 10 hours)
What if my state is [10,20,30,40] and my action_space is Discrete 4.
I am getting "DQN expects a model that has one dimension for each action, in this case 4"
My shape is (None, 1,4 ) i cant fix it.
@vts_22 2 ปีที่แล้ว
I should probably change DQN and ADAM
@NicholasRenotte 2 ปีที่แล้ว ⁺¹
Discrete(4) will return actions 0,1,2,3 as integer values. That sounds like your observation space is incorrect, looks like that would be (None, 4) if you've got [10,20,30,40]
@vts_22 2 ปีที่แล้ว
@@NicholasRenotte I edited Sequential layers by looking at your videos, my observation_space was same with you observation_space in one video. I watched all your videos thanks for videos.I built basic snake game but im amining to design 2d space travel game with gravitation and orbits. My problem was creating wrong layers, i understand it better. Thanks again for your excellent videos.
@samiul2009 3 ปีที่แล้ว ⁺¹
Awesome!! I was wondering how would it work with path finding with some obstacles.
@NicholasRenotte 3 ปีที่แล้ว
Normally the RL agent takes a while to find it's way but eventually works around it. I'm actually working on RL for super mario atm, it's definitely taking a while though @Samiul!
@samiul2009 3 ปีที่แล้ว
@@NicholasRenotte wow! Waiting for rendering work of the environment
@NicholasRenotte 3 ปีที่แล้ว
@@samiul2009 awesome, yup got pygame in the pipeline!
@edzme 3 ปีที่แล้ว ⁺¹
Yep this is exactly what I was looking for. Could you make an example with a Dict space? Please thx!
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
You got it, added to the list @Ed!
@edzme 3 ปีที่แล้ว ⁺¹
@@NicholasRenotte yessssssss!
@NicholasRenotte 3 ปีที่แล้ว
@@edzme yeahhhhyaaaa!!
@jordan6921 3 ปีที่แล้ว ⁺¹
Ooo pygame would be such a cool thing to see. I wonder if Retro Gym environments work too!
@NicholasRenotte 3 ปีที่แล้ว
IKR, Pygame is definitely on the list! I've tested with some of the Atari envs and them seem to work, take a while to train but they work!
@farzamtaghipour509 3 ปีที่แล้ว ⁺¹
Thank you so much for the content. Thumbs up.
@NicholasRenotte 3 ปีที่แล้ว
Thanks so much @Farzam!
@talitaaraujo1327 3 ปีที่แล้ว ⁺¹
Man, your videos are so great! Congrats!!!!! I have one question: i can't install a keras-r12. Maybe you can help me.
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Try keras-rl2, it should be an l instead of 1
@talitaaraujo1327 3 ปีที่แล้ว
@@NicholasRenotte Thank you!!!
@nathanielmontanez1652 3 ปีที่แล้ว ⁺¹
When I'm fitting the model I get this error:
TypeError: Keras symbolic inputs/outputs do not implement `__len__`. You may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model. This error will also get raised if you try asserting a symbolic input/output directly.
Can someone please help me fix this?
Code:
from gym import Env
from gym.spaces import Discrete, Box
import numpy as np
import random
class ShowerEnv(Env):
def __init__(self):
self.action_space = Discrete(3)
self.observation_space = Box(low=np.array([0]), high=np.array([100]))
self.state = 38 + random.randint(-3, 3)
self.shower_length = 60
def step(self, action):
self.state += action - 1
self.shower_length -= 1

if self.state>37 and self.state
@NicholasRenotte 3 ปีที่แล้ว
Heya @Nathaniel, what versions of the packages are you using? Looks like it might be a package version issue: github.com/keras-rl/keras-rl/issues/348
@nathanielmontanez1652 3 ปีที่แล้ว ⁺¹
@@NicholasRenotte
Here are the versions:
keras: 2.4.3
tensorflow: 2.4.0
keras-rl: 0.4.2
gym: 0.18.0
@NicholasRenotte 3 ปีที่แล้ว
@@nathanielmontanez1652 can you try downgrading the versions from the GH link?
@nathanielmontanez1652 3 ปีที่แล้ว ⁺¹
@@NicholasRenotte
Thank you, it worked!
@NicholasRenotte 3 ปีที่แล้ว
@@nathanielmontanez1652 YESSS! Awesome work!
@whataday3910 3 ปีที่แล้ว ⁺¹
Hey Nicholas! Thanks for your tutorial! I try to get in touch with RL
As you mentioned there is an error in the code: AttributeError: 'Sequential' object has no attribute '_compile_time_distribution_strategy'
I placed "del model":
def build_model(states, actions):
model = Sequential()
model.add(Dense(24, activation='relu', input_shape=states))
model.add(Dense(24, activation='relu'))
model.add(Dense(actions, activation='linear'))
return model
del model
model = build_model(states, actions)
model.summary()
but I only get:
NameError: name 'model' is not defined. If I place the "del model" after "model = build_model(states, actions)" the model is deleted but I get:
"dqn = build_agent(model, actions)
NameError: name 'model' is not defined"
Do you have an advice?
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Oh got it, after you've deleted it, recreate the model! So the flow should be:
1. Create it
2. Delete it if there are errors
3. Create it again
@whataday3910 3 ปีที่แล้ว
@@NicholasRenotte Thanks a lot. I got it! Is there an option to print multiple actual states(temperature) and the actions during testing in one episode? That would be helpfull to know what the agent is doing in addition of the reward.
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
@@whataday3910 definitely, you could print out states and actions within the environment class!
@whataday3910 3 ปีที่แล้ว
@@NicholasRenotte I feel dumb that I don't thought about that and wasted two hours :D I will try to get fimilar with classes. I got it working. Thanks man!
@NicholasRenotte 3 ปีที่แล้ว
@@whataday3910 yessss! Nah it's all a learning process, no stress!
@tomleyshon8610 ปีที่แล้ว
Hey great tutorial! However, when I run your ipynb I get a TypeError when building the agent. TypeError: Keras symbolic inputs/outputs do not implement '__len__'. This error occurs when executing build_agent(model,actions). I'm wondering if anyone else had this issue when trying to run the notebook?
@apreceptorswanhindi 2 ปีที่แล้ว
Hey man, Thanks for the wonderful video. I made a custom environment, and Keras-rl2 is taking a lot of time, not utilizing the GPUs. How can I optimize the training of this or similar codes using TensorFlow 2 on remote GPU with Ubuntu 20.0?
20.04.4 LTS (GNU/Linux 5.13.0-52-generic x86_64)
NVIDIA-SMI 515.48.07 Driver Version: 515.48.07
with four NVIDIA GeForce RTX 3080 GPUs 10 GB each
@廖紘毅-w6i 3 ปีที่แล้ว ⁺¹
If my observation space is discrete, what should I write my input_shape of build model ? Can you help me ?
@NicholasRenotte 3 ปีที่แล้ว
Is it just a single discrete value?
@廖紘毅-w6i 3 ปีที่แล้ว ⁺¹
Yes, I set it as spaces.Discrete(3)
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
@@廖紘毅-w6i change the following:
1. Change the states space from this:
states = env.observation_space.shape
To this:
states = env.observation_space.n
2. Change the input layer from this:
model.add(Dense(24, activation='relu', input_shape=states))
To this:
model.add(Dense(24, activation='relu', input_dim=states))
@廖紘毅-w6i 3 ปีที่แล้ว
@@NicholasRenotte Thank you for your advice, but when I changed, I got this Error when checking input: expected dense_input to have shape (3,) but got array with shape (1,), when running dqn.fit(env, nb_steps=50000, visualize=False, verbose=1).
@廖紘毅-w6i 3 ปีที่แล้ว ⁺¹
@@NicholasRenotte Thank you very much for your reply,
I change my code to
model.add(keras.Input(1))
)
model.add(Dense(24, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(actions, activation='linear'))
and the dqn can be trained now, but when the code is running the line scores = dqn.test(env, nb_episodes=100, visualize=False, verbose=1), the terminal only shows Testing for 100 episodes ... and do nothing.
@Abhishekkumar-jl2dr 3 ปีที่แล้ว
I'm getting an error while setting up agent(Keras symbolic inputs/outputs do not implement `__len__`. You may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model. This error will also get raised if you try asserting a symbolic input/output directly.) Please Help
@aaronm.9780 2 ปีที่แล้ว
Getting this one as well, any news on it?
@DDDDD723 3 ปีที่แล้ว ⁺¹
Thanx!
Could you do the same only on tf_agents?
@NicholasRenotte 3 ปีที่แล้ว ⁺¹
Definitely! Stay tuned. Got something on stable baselines as well if you're interested!

ต่อไป

เล่นอัตโนมัติ

A.I. Learns to Play Space Invaders | Reinforcement Learning