This tutorial saved me from unsuccessful training. The key was the way of movement. Using the XZ movement while rotating the player based on the direction doesn't work well with the ray perception sensor. That was because the agent with random input would cause the body to spin around rapidly and can't detect objects with the sensors.
Hi, I’m making a project that’s a bit similar and I’ve ran into the same issue where the agent doesn’t really improve and bounces between positive and negative rewards. Was just wondering if you could please explain some possible fixes for this. My system is a bit more complex as there are roads so I’m not sure if taking out the target position from the observations would work considering that it can’t find the goal just by spinning around
One thing I've found out through these projects is that negative rewards aren't really necessary and can actually lead to undesirable behavior (See positive and negative reinforcement in psychology). I tested this in my most recent project where I used various positive and/or negative rewards in different training scenarios, so perhaps try removing the negative rewards as missing out on positive rewards should be enough motivation to improve. If your environment is very complex try using a simpler environment and gradually make it more complex as the training goes on (I explore this a little bit in my most recent series as well and it did help with training). You could leave the target position there if you want, when left it will essentially just learn to move to those specific coordinates (Mine learned to spin to basically find the "coordinates" since I didn't give them from the start). I am assuming your agent is driving/walking on the roads to find the objective, using vision/rays/coordinates should all work given enough time and gradual training. You could try increasing the agents curiosity, that will make it more willing to try new things (If I'm recalling correctly) which may be what you need to get it out of that training slump if the above suggestions don't work. I hope this helped :D
@Speo_ curiosity is a hyperparameter for training; I haven't used it myself but I believe it should make your agent more (Or less) willing to try new things. Check out this forum where someone asks about the curiosity hyperparameter and is shown how to implement it github.com/Unity-Technologies/ml-agents/issues/5004
Thanks for the video, it really helps me to get into ml-agents. The reason why the one model without the ray’s didn’t work was because you need to give the current rotation as input. Secondly I am wondering how did the model get the observation of the rays, as you have not specified it in the CollectObservations function. And lastly I am getting a warning for having too many observations and I can fix it by setting the size of the vector space in the Agent Controller Script to the amount of observations I have. You do not get this warning, so I am wondering why not and how does it still work?
Hi, I'm glad the video helped you get started with MLAgents! The agent is able to use the rays to see because I have selected in the inspector panel to use child objects (MLAgents takes care of the rest behind the scenes). As for the warning, it sounds like somewhere there are more observations being made (As you had already determined by the sound of it), so as for why I don't have it we have something different between our two agents/scripts or it's a bug in MLAgents
Hello, i want to use for exemple two different ray perception for a carr, one that detect obstacles and the other checkpoints. Since you are using only one you dont have to do something of the like: public RayPerceptionSensorComponent3D raySensorCheckpoints; public RayPerceptionSensorComponent3D raySensorObstacle; The thing is in the observation function i dont know how to implement them. I belive i have to do that because the car must follow checkpoints and avoid obstacle.
With MLAgents the agent learns what to avoid and what to pursue via trial and error. You can use the same raycast(s) to look at multiple things (Like I do here). I don't see a reason to have separate raycasts for different objectives but I suppose that would work, just make them see different tags
@@_Jason_Builds The idea is that there migh be obstacle beyond checkpoints that one raycast cannot see , os a second should be usefull to see through the checkpoints.
Hi Again! I am have made a hide and seek environment with a single seeker agent and a single hider agent. What I would like to do is to use the ray perception sensor 3d to give rewards depending if the hider is in the seekers view or not. What should I use for this, as the resources for the subject are rather scarce. Also, are the tags used in any way? for example if the seeker sensor hits the hider sensor(tag 2) then the seeker gains rewards and respectively the hider loses rewards
With a ray cast, you can check to see what object is being returned (I tend to use a Tag for this purpose butt here are other methods such as having a certain script or name) and from there take appropriate actions (Such as giving rewards). In the video I just recently put out my agent has a gun of sorts and it uses a raycast for hitscan shooting; I check what the ray hit and depending on what it hit I have different outcomes. On the agent itself I have a method that checks if I am hitting the correct target and assigns points accordingly. I'm not sure if that would help with a hide and seek scenario, but it may be a start point for the hunter agent (If I were to do a hide and seek project like you are, I'd properly refer back to my code from the above project as a starting point which is why I am suggesting it 😅). If you were to do hide and seek tag, the hunter and prey agents in the series this comment is on may be a good starting point instead. I haven't tried supplying rewards based on the perception sensor itself so I am not sure if there is an easy way to do so, but you could probably use a ray cast which would act as a sort of "Aha I found you" with the seeker agent "pointing" a "finger" and the hiding agent. Think of two children playing hide and seek, when the one seeking finds the one hiding that child will probably point their finger (A ray cast) at them and announce that they found them. I would refrain from giving negative rewards to the hiding agent (But feel free to experiment!) for being found as I have found through testing with my current project that losing out on a reward is enough motivation to improve where as getting a negative reward isn't (Essentially how positive reinforcement and negative reinforcement work in real life)
Hi! Why does the scene kind of freeze for a second every now and then? I can assume its because of the high computational power being used but im unsure. Since I also get these freezes on my own project and thought I may be doing something wrong
The freezing is normal, so no need to worry! As far as I am aware, the freezing occurs when all of the information that has been gathered through training is processed which causes unity to halt during that time. I believe altering the buffer size can influence how often the freezing occurs, but I have not messed around enough with that feature to be certain of that
HI .. I'm doing a similar project but the difference is that I want the Agent to push the Target 'til it hits the walls .. my problem is how can I endEpisode when The target hits the walls what function should I use
I think MLAgents has an example program that does what you're working on (I'm not sure though) so that might be a good place to start. Without looking to see if there is one though, my first thought would be to have a check for if the pushable object is touching the wall. If so, then the episode ends and if not then it keeps going
@@_Jason_Builds oh I found push block thank you so much I think that will help me ❤ update : I didn't find any source code and I'm still stuck for only one thing ... how can I endEpisode/AddReward if two objects collided knowing that Agent isn't one of them ?
Hi! So for the Space Size when I put it 3, it gives a warning message that it should be 6, when I put the space size as 6, I get this error message: Observation at index=0 for agent with id=51 didn't match the ObservationSpec. Expected shape (22,) but got (44,).
I'm not entirely sure, but the firsty thing I would suggest doing (Based off the quick research into the error I did) is making sure you don't have a brain attached to the agent. There may already be a brain attached to the agent that was created with an observation size of 3, so when you try to increase to 6 you are trying to train a bigger brain in a smaller brain (For lack of a better explanation 😅). If you do have a brain attached, try removing it and trying again
@@_Jason_Builds Hi Jason, thanks for the feedback, I decided to switch to the Zombie tutorial as my Target was moving in my project, thanks for the help and looking forward to more MLAgents videos!
Wait, so why we commented out observation about target? Because agent will observe it with rays? But it was already know it's position just through code, I'm kinda confuced why it was better with rays
It's not necessarily better with rays, this was just the direction I wanted to go with the agent. I didn't want the agent to be aware of where the target was in the world and instead I wanted it to learn to identify what the target was and then figure out how to get there. Both ways of going about determining a target's position are completely valid and both have their use cases and in this scenario, I wanted to experiment with having the agent use rays instead of being given coordinates
@@_Jason_Builds Thanks for answer! So it's just another way, so for example, with rays, if target would be behind the walls, agent will learn to walk around the walls and will be seeking for target, but without rays it will know where it is and will go straight to target
@sol1dBl3ck Without rays, and with knowing the coordinate position of the target, the agent would indeed know where the target is, however it would need to know where the wall is in order to know to go around the wall to get to the target. If you didn't want to tell the agent where the wall was, and you didn't want to give any information outside of where the target is, the agent would eventually figure out how to get to the target so long as the target was always in the same spot. It would be able to do this because the agent would continuously take random paths until eventually one path would lead to the target. The issue with this though, is that you would be overfitting the agent, meaning if you ever move the target or give the agent a different starting position, it would have to start all over on figuring out how to get to the target. To prevent overfitting the agent, you can either use raycasts to allow the agent to see its environment (Such as walls and the target) or you can give the agent the position of the walls and the target. As an example of each. Imagine Ray casts as being able to see with your eyes, but you can only identify what you're looking at if you know what that thing is. Anything that you don't know of, looks the exact same as anything else you don't know of, but if you know if walls and the targets then you know when you're looking at a wall or a target. For using coordinates (position) only, you can think of it as navigating with a map. You know where you are on the map and you know where everything else is on the map, but you can only see the map. Meaning you can't see anything that isn't on the map. If the target is directly in front of you, but there is a wall in the way and there isn't a wall on the map, then you don't know why you can't go forward to the target. Eventually you learn that you can't take a straight path to the target, but you never find out why. Hopefully that all makes sense. I wrote this out on my phone so I can't easily double-check everything. If you have any other questions or need/want any further clarification feel free to let me know 😁 Edit: I should have mentioned that adding wall positions or ray casts does not prevent overfitting, but are tools to help prevent overfitting. To actually help prevent overfitting the agent, you need to give it different scenarios (Such as moving the agent and target to random starting positions).
The Ray Perception Sensor returns Tags that it comes in contact with; I do not know the specifics on how exactly it is written but, I believe the MLAgents library is utilizing features that the Unity game engine does on its own. These links might help though: forum.unity.com/threads/need-help-understanding-the-ray-perception-sensors.894673/ discussions.unity.com/t/need-help-understanding-ray-perception-sensors/235447?clickref=1011lyauf24q& docs.unity3d.com/Packages/com.unity.ml-agents@2.0/api/Unity.MLAgents.Sensors.RayPerceptionSensor.html
I hope the series continues ,thanks for the knowledge and effort :)
This tutorial saved me from unsuccessful training. The key was the way of movement. Using the XZ movement while rotating the player based on the direction doesn't work well with the ray perception sensor. That was because the agent with random input would cause the body to spin around rapidly and can't detect objects with the sensors.
I had quite a few problems getting the movement/rotation working myself, I'm glad I was able to help!
Love the channel! Hope it grows.
Thank you so much!
@@_Jason_Builds This weekend, I am going to load up your examples and provide feedback. :)
@@stevenpike7857 Awesome! I look forward to hearing from you and seeing what you come up with!
Thank you
Hi, I’m making a project that’s a bit similar and I’ve ran into the same issue where the agent doesn’t really improve and bounces between positive and negative rewards.
Was just wondering if you could please explain some possible fixes for this. My system is a bit more complex as there are roads so I’m not sure if taking out the target position from the observations would work considering that it can’t find the goal just by spinning around
One thing I've found out through these projects is that negative rewards aren't really necessary and can actually lead to undesirable behavior (See positive and negative reinforcement in psychology). I tested this in my most recent project where I used various positive and/or negative rewards in different training scenarios, so perhaps try removing the negative rewards as missing out on positive rewards should be enough motivation to improve.
If your environment is very complex try using a simpler environment and gradually make it more complex as the training goes on (I explore this a little bit in my most recent series as well and it did help with training).
You could leave the target position there if you want, when left it will essentially just learn to move to those specific coordinates (Mine learned to spin to basically find the "coordinates" since I didn't give them from the start). I am assuming your agent is driving/walking on the roads to find the objective, using vision/rays/coordinates should all work given enough time and gradual training.
You could try increasing the agents curiosity, that will make it more willing to try new things (If I'm recalling correctly) which may be what you need to get it out of that training slump if the above suggestions don't work.
I hope this helped :D
@@_Jason_Builds thank you, this is really helpful and I’ll try this, one thing I need to ask though is what you mean by increasing its curiosity
@Speo_ curiosity is a hyperparameter for training; I haven't used it myself but I believe it should make your agent more (Or less) willing to try new things.
Check out this forum where someone asks about the curiosity hyperparameter and is shown how to implement it github.com/Unity-Technologies/ml-agents/issues/5004
Thanks for the video, it really helps me to get into ml-agents. The reason why the one model without the ray’s didn’t work was because you need to give the current rotation as input. Secondly I am wondering how did the model get the observation of the rays, as you have not specified it in the CollectObservations function. And lastly I am getting a warning for having too many observations and I can fix it by setting the size of the vector space in the Agent Controller Script to the amount of observations I have. You do not get this warning, so I am wondering why not and how does it still work?
Hi, I'm glad the video helped you get started with MLAgents! The agent is able to use the rays to see because I have selected in the inspector panel to use child objects (MLAgents takes care of the rest behind the scenes). As for the warning, it sounds like somewhere there are more observations being made (As you had already determined by the sound of it), so as for why I don't have it we have something different between our two agents/scripts or it's a bug in MLAgents
Hello, i want to use for exemple two different ray perception for a carr, one that detect obstacles and the other checkpoints.
Since you are using only one you dont have to do something of the like:
public RayPerceptionSensorComponent3D raySensorCheckpoints;
public RayPerceptionSensorComponent3D raySensorObstacle;
The thing is in the observation function i dont know how to implement them.
I belive i have to do that because the car must follow checkpoints and avoid obstacle.
With MLAgents the agent learns what to avoid and what to pursue via trial and error. You can use the same raycast(s) to look at multiple things (Like I do here). I don't see a reason to have separate raycasts for different objectives but I suppose that would work, just make them see different tags
@@_Jason_Builds The idea is that there migh be obstacle beyond checkpoints that one raycast cannot see , os a second should be usefull to see through the checkpoints.
@@dahamitaoufik6741You can definitely do multiple raycasts, I think it should be the same setup as using one
Hi Again! I am have made a hide and seek environment with a single seeker agent and a single hider agent. What I would like to do is to use the ray perception sensor 3d to give rewards depending if the hider is in the seekers view or not. What should I use for this, as the resources for the subject are rather scarce. Also, are the tags used in any way? for example if the seeker sensor hits the hider sensor(tag 2) then the seeker gains rewards and respectively the hider loses rewards
With a ray cast, you can check to see what object is being returned (I tend to use a Tag for this purpose butt here are other methods such as having a certain script or name) and from there take appropriate actions (Such as giving rewards). In the video I just recently put out my agent has a gun of sorts and it uses a raycast for hitscan shooting; I check what the ray hit and depending on what it hit I have different outcomes. On the agent itself I have a method that checks if I am hitting the correct target and assigns points accordingly.
I'm not sure if that would help with a hide and seek scenario, but it may be a start point for the hunter agent (If I were to do a hide and seek project like you are, I'd properly refer back to my code from the above project as a starting point which is why I am suggesting it 😅). If you were to do hide and seek tag, the hunter and prey agents in the series this comment is on may be a good starting point instead.
I haven't tried supplying rewards based on the perception sensor itself so I am not sure if there is an easy way to do so, but you could probably use a ray cast which would act as a sort of "Aha I found you" with the seeker agent "pointing" a "finger" and the hiding agent. Think of two children playing hide and seek, when the one seeking finds the one hiding that child will probably point their finger (A ray cast) at them and announce that they found them.
I would refrain from giving negative rewards to the hiding agent (But feel free to experiment!) for being found as I have found through testing with my current project that losing out on a reward is enough motivation to improve where as getting a negative reward isn't (Essentially how positive reinforcement and negative reinforcement work in real life)
Hi! Why does the scene kind of freeze for a second every now and then? I can assume its because of the high computational power being used but im unsure. Since I also get these freezes on my own project and thought I may be doing something wrong
The freezing is normal, so no need to worry! As far as I am aware, the freezing occurs when all of the information that has been gathered through training is processed which causes unity to halt during that time. I believe altering the buffer size can influence how often the freezing occurs, but I have not messed around enough with that feature to be certain of that
Thanks!!
HI .. I'm doing a similar project but the difference is that I want the Agent to push the Target 'til it hits the walls .. my problem is how can I endEpisode when The target hits the walls what function should I use
I think MLAgents has an example program that does what you're working on (I'm not sure though) so that might be a good place to start. Without looking to see if there is one though, my first thought would be to have a check for if the pushable object is touching the wall. If so, then the episode ends and if not then it keeps going
@@_Jason_Builds oh I found push block thank you so much I think that will help me ❤
update : I didn't find any source code and I'm still stuck for only one thing ... how can I endEpisode/AddReward if two objects collided knowing that Agent isn't one of them ?
Hi! So for the Space Size when I put it 3, it gives a warning message that it should be 6, when I put the space size as 6, I get this error message: Observation at index=0 for agent with id=51 didn't match the ObservationSpec. Expected shape (22,) but got (44,).
I'm not entirely sure, but the firsty thing I would suggest doing (Based off the quick research into the error I did) is making sure you don't have a brain attached to the agent. There may already be a brain attached to the agent that was created with an observation size of 3, so when you try to increase to 6 you are trying to train a bigger brain in a smaller brain (For lack of a better explanation 😅). If you do have a brain attached, try removing it and trying again
@@_Jason_Builds Hi Jason, thanks for the feedback, I decided to switch to the Zombie tutorial as my Target was moving in my project, thanks for the help and looking forward to more MLAgents videos!
Wait, so why we commented out observation about target? Because agent will observe it with rays? But it was already know it's position just through code, I'm kinda confuced why it was better with rays
It's not necessarily better with rays, this was just the direction I wanted to go with the agent. I didn't want the agent to be aware of where the target was in the world and instead I wanted it to learn to identify what the target was and then figure out how to get there. Both ways of going about determining a target's position are completely valid and both have their use cases and in this scenario, I wanted to experiment with having the agent use rays instead of being given coordinates
@@_Jason_Builds Thanks for answer! So it's just another way, so for example, with rays, if target would be behind the walls, agent will learn to walk around the walls and will be seeking for target, but without rays it will know where it is and will go straight to target
@sol1dBl3ck Without rays, and with knowing the coordinate position of the target, the agent would indeed know where the target is, however it would need to know where the wall is in order to know to go around the wall to get to the target.
If you didn't want to tell the agent where the wall was, and you didn't want to give any information outside of where the target is, the agent would eventually figure out how to get to the target so long as the target was always in the same spot. It would be able to do this because the agent would continuously take random paths until eventually one path would lead to the target. The issue with this though, is that you would be overfitting the agent, meaning if you ever move the target or give the agent a different starting position, it would have to start all over on figuring out how to get to the target.
To prevent overfitting the agent, you can either use raycasts to allow the agent to see its environment (Such as walls and the target) or you can give the agent the position of the walls and the target.
As an example of each. Imagine Ray casts as being able to see with your eyes, but you can only identify what you're looking at if you know what that thing is. Anything that you don't know of, looks the exact same as anything else you don't know of, but if you know if walls and the targets then you know when you're looking at a wall or a target.
For using coordinates (position) only, you can think of it as navigating with a map. You know where you are on the map and you know where everything else is on the map, but you can only see the map. Meaning you can't see anything that isn't on the map. If the target is directly in front of you, but there is a wall in the way and there isn't a wall on the map, then you don't know why you can't go forward to the target. Eventually you learn that you can't take a straight path to the target, but you never find out why.
Hopefully that all makes sense. I wrote this out on my phone so I can't easily double-check everything. If you have any other questions or need/want any further clarification feel free to let me know 😁
Edit: I should have mentioned that adding wall positions or ray casts does not prevent overfitting, but are tools to help prevent overfitting. To actually help prevent overfitting the agent, you need to give it different scenarios (Such as moving the agent and target to random starting positions).
@@_Jason_Builds Wow, thanks a lot for this answer! It definitely makes sense and I understood more of those concepts
@@sol1dBl3ck Awesome! I'm glad I was able to help 😁
I do not get it. How model know what is the output from "Ray"? Why this information is not passes to "CollectObservations" ?
The Ray Perception Sensor returns Tags that it comes in contact with; I do not know the specifics on how exactly it is written but, I believe the MLAgents library is utilizing features that the Unity game engine does on its own. These links might help though:
forum.unity.com/threads/need-help-understanding-the-ray-perception-sensors.894673/
discussions.unity.com/t/need-help-understanding-ray-perception-sensors/235447?clickref=1011lyauf24q&
docs.unity3d.com/Packages/com.unity.ml-agents@2.0/api/Unity.MLAgents.Sensors.RayPerceptionSensor.html