Board games require *search*. This is defined differently from exploration. Search involves following the known rules of the game, exploration involves trying to discover what the rules are.
According to the slide 1 page 15 of this course in 2023 winter: "AI planning assumes having a model of how decisions impact environment". It seems the model is given and do not need to be learn by exploration process explained in 9:09.
6 หลายเดือนก่อน
IMO it does need exploration, even if rules of the game are given agent needs to explore new strategies, "what if I play it different this time?" Deepmind when doing AlphaGo framed it as RL problem and also shown that by exploration their agent was able to discover new, previously unknown to human strategies, so solving Go *is not* only planning
My experience is that, usually graduate courses are focused on science side of the computer with the expectation that students already figured out most engineer side during undergrad
Good morning! I'm thought the relation about women's violence and the perspectives about women to have carrer, good jobs and salary. However, the acess a one better style of life, like a good alimentation and housing is the basic for get free a opressions.
This is what i feel the better example for RL Agent: Represents an individual, such as a human being, who is making decisions and taking actions in the environment. Environment: Represents the world or the context in which the agent operates. It includes all the factors that influence the outcomes of the agent's actions, such as societal norms, laws, and consequences. State or Events: These are the situations or circumstances the agent encounters in the environment. In your analogy, these could be viewed as the events or experiences that the individual faces in their life. Actions (Good or Bad Deeds): Actions taken by the agent in response to the events or states encountered in the environment. These actions can be categorized as good deeds (positive actions) or bad deeds (negative actions) based on their consequences. Rewards (Paradise or Hell): The consequences of the agent's actions. Good deeds may result in positive rewards (such as paradise), while bad deeds may result in negative rewards (such as hell). Here's how the analogy relates to RL: The agent (individual) learns from its experiences in the environment, just as RL agents learn from interacting with their environments. Good deeds (positive actions) lead to positive rewards (paradise), reinforcing the behavior. Bad deeds (negative actions) lead to negative rewards (hell), discouraging the behavior. In RL terms, the agent aims to maximize its cumulative reward over time by learning which actions lead to desirable outcomes (paradise) and which actions lead to undesirable outcomes (hell). Through trial and error, the agent learns to choose actions that maximize its long-term reward, similar to how humans learn from their experiences to make better decisions in life. Agree ?
The lecturer is wrong about Markov's assumption @36:02. Blood pressure, exercise, etc. are different features of a state. Given the values of all features at the current state, blood pressure is independent of the values of any features in all past states. Hence the system is Markov.
@@TacoMaster07 To put it simply, Markov's Assumption is "that the *future* actions are influenced only by the *present*, not the *past* states". In other words, Markov's Assumption emphasizes only the independencies across the time domain, *NOT* the independencies across different features. Hence, explaining the concept of Markov's Assumption by saying one feature, such as "blood pressure", is dependent on other features, such as "exercise", is missing the point on time domain dependency. The professor's other example, such as "hot-or-not-outside", is simply wrong. Blood pressure can be dependent on the temperature outside, but still satisfies Markov's assumption, as long as knowing *current* temperature provides sufficient information about blood pressure, regardless of the temperature in the past. In fact, Markov's Assumption is so widely applicable exactly because most natural processes are continuous in time, i.e. knowing the current state is often sufficient to ignore the past states. The common examples of processes that violate Markov's Assumption are human discretion and randomness.
But your current blood pressure is dependent on actions you took in the past, and these actions will still influence it. So taking medication may have different outcomes (future states) for a given blood pressure value (state).
@@vfestuga If by “action in the past” you mean, for example, “taking blood-pressure regulating medication”, which affects current blood pressure, and current blood pressure affects current action of taking more blood pressure regulating medicine, then agreed, this process is NOT Markov. However, the lecturer cited “other features in the past, such as exercise or just-ate-a-meal, etc.” as the reason, hence missing pointing out whether there is a path from past blood pressure to current blood pressure, which is the key to Markov Assumption. This could be misleading. Markov Assumption is valuable because it allows us to determine the current state without knowing the very initial state of the system, which simplifies computation. Hence, if my blood pressure at t can be fully determined by knowing my action at t-1, it is Markov. It becomes a problem if determining my blood pressure at t somehow requires knowing my blood pressure at t-1, because my blood pressure at t-1 will then be dependent on my blood pressure at t-2, so on and so forth, all the way till t at time 0, which will not be Markov. So, the Markov Assumption is not so much about whether the current state is dependent on any states in the past. If there is such dependency, such as just ate a meal, etc, as provided by the lecturer as an example, it can be resolved by tricks such as realigning the timestamp if necessary. The key takeaway should be whether there is a path from the same feature in the past states affecting the feature in discussion in the current state. When I see doctors, they prescribe medicine based mostly on my current symptoms, most of the time anyway. I don’t know about you, but it certainly sounds reasonable to me. More thoughts?
@@Bao_Lei Well I think the caveat is that the action is taking medicine. What she's arguing is that just because the state of your blood pressure is high, it doesn't mean that you should take medicine. It depends on whether your blood pressure is high at rest or your blood pressure is high from exercise. I know that the system is still Markov, since the next state only depends on the previous state, but I guess she's pushing that you shouldn't do a specific action based solely on the previous state, and have to take into account other things.
It is making me confused how she explained Markov Assumption. "We're gonna assume that the state used by the agent is a sufficient statistic of the history and that in order to predict the future, you only need to know the current state of the environment. The future of independent of the past given the present". What does that actually mean? Can someone simplify it? Thanks.
In Markov random processes, this is a basic assumption. state at t+1 only depends on state t or we can say given st, s_(t+1) is independent of s_(t-1),s_(t-2).....etc. google markov chains
Wow, the Stanford online? Is SBF a member of your club? Maybe you know where he is? Oh wait he went to MIT. I guess it's the sultry wood nymph that went to Stanford that I was hoping to hear about today in your lecture. I can hardly wait to hear what you have to say given the newfound credibility of Stanford grads in the crypto space. I'm super excited for another fancy box scheme. Or better yet maybe you can tell me some more about how I can get my blood tested by some super slick new machines that another Stanford grad came up with. WAY EXCITED!!
the fact that this is available online for free is truly remarkable. stanford rocks.
oh yea? whatdya learn bro?
Yeah, agree. A lot of knowledge shared 🙃
Thank you for making this course available. Thank you Stanford
This is really such an incredible lecture series. So much informative yet really well explained.
Future is independent of past given present🙃 I am taking this class at the southwestern corner of Northeastern University
Very clear, thank you for sharing!
Thank you for sharing this video
11:41 I wonder why go game doesn't need exploration. In fact even human player would compute several steps after to see some possibilities.
It seems counter intuitive
I am commenting to come back in case of a good reply
Board games require *search*. This is defined differently from exploration. Search involves following the known rules of the game, exploration involves trying to discover what the rules are.
According to the slide 1 page 15 of this course in 2023 winter: "AI planning assumes having a model of how decisions impact environment". It seems the model is given and do not need to be learn by exploration process explained in 9:09.
IMO it does need exploration, even if rules of the game are given agent needs to explore new strategies, "what if I play it different this time?"
Deepmind when doing AlphaGo framed it as RL problem and also shown that by exploration their agent was able to discover new, previously unknown to human strategies, so solving Go *is not* only planning
Lecture proper 22:49
8:00 Reward for sequence of Decision
Keep posting more stuff , we all can't pay to Andrew Ng for this stuff
Are coding assignments available for the public?
Don't they combine RL theories with code when teaching basic knowledge?
As a graduate student you are kind of expected to figure out the coding aspect on your own and seek help of TA's office hours :)
My experience is that, usually graduate courses are focused on science side of the computer with the expectation that students already figured out most engineer side during undergrad
Good morning! I'm thought the relation about women's violence and the perspectives about women to have carrer, good jobs and salary. However, the acess a one better style of life, like a good alimentation and housing is the basic for get free a opressions.
This is what i feel the better example for RL
Agent: Represents an individual, such as a human being, who is making decisions and taking actions in the environment.
Environment: Represents the world or the context in which the agent operates. It includes all the factors that influence the outcomes of the agent's actions, such as societal norms, laws, and consequences.
State or Events: These are the situations or circumstances the agent encounters in the environment. In your analogy, these could be viewed as the events or experiences that the individual faces in their life.
Actions (Good or Bad Deeds): Actions taken by the agent in response to the events or states encountered in the environment. These actions can be categorized as good deeds (positive actions) or bad deeds (negative actions) based on their consequences.
Rewards (Paradise or Hell): The consequences of the agent's actions. Good deeds may result in positive rewards (such as paradise), while bad deeds may result in negative rewards (such as hell).
Here's how the analogy relates to RL:
The agent (individual) learns from its experiences in the environment, just as RL agents learn from interacting with their environments.
Good deeds (positive actions) lead to positive rewards (paradise), reinforcing the behavior.
Bad deeds (negative actions) lead to negative rewards (hell), discouraging the behavior.
In RL terms, the agent aims to maximize its cumulative reward over time by learning which actions lead to desirable outcomes (paradise) and which actions lead to undesirable outcomes (hell). Through trial and error, the agent learns to choose actions that maximize its long-term reward, similar to how humans learn from their experiences to make better decisions in life.
Agree ?
Thank you very much
The lecturer is wrong about Markov's assumption @36:02. Blood pressure, exercise, etc. are different features of a state. Given the values of all features at the current state, blood pressure is independent of the values of any features in all past states. Hence the system is Markov.
blood pressure is dependent on states like exercise, what you eat * how much , genetics, etc.
@@TacoMaster07 To put it simply, Markov's Assumption is "that the *future* actions are influenced only by the *present*, not the *past* states". In other words, Markov's Assumption emphasizes only the independencies across the time domain, *NOT* the independencies across different features.
Hence, explaining the concept of Markov's Assumption by saying one feature, such as "blood pressure", is dependent on other features, such as "exercise", is missing the point on time domain dependency. The professor's other example, such as "hot-or-not-outside", is simply wrong. Blood pressure can be dependent on the temperature outside, but still satisfies Markov's assumption, as long as knowing *current* temperature provides sufficient information about blood pressure, regardless of the temperature in the past.
In fact, Markov's Assumption is so widely applicable exactly because most natural processes are continuous in time, i.e. knowing the current state is often sufficient to ignore the past states. The common examples of processes that violate Markov's Assumption are human discretion and randomness.
But your current blood pressure is dependent on actions you took in the past, and these actions will still influence it. So taking medication may have different outcomes (future states) for a given blood pressure value (state).
@@vfestuga If by “action in the past” you mean, for example, “taking blood-pressure regulating medication”, which affects current blood pressure, and current blood pressure affects current action of taking more blood pressure regulating medicine, then agreed, this process is NOT Markov. However, the lecturer cited “other features in the past, such as exercise or just-ate-a-meal, etc.” as the reason, hence missing pointing out whether there is a path from past blood pressure to current blood pressure, which is the key to Markov Assumption. This could be misleading.
Markov Assumption is valuable because it allows us to determine the current state without knowing the very initial state of the system, which simplifies computation. Hence, if my blood pressure at t can be fully determined by knowing my action at t-1, it is Markov. It becomes a problem if determining my blood pressure at t somehow requires knowing my blood pressure at t-1, because my blood pressure at t-1 will then be dependent on my blood pressure at t-2, so on and so forth, all the way till t at time 0, which will not be Markov.
So, the Markov Assumption is not so much about whether the current state is dependent on any states in the past. If there is such dependency, such as just ate a meal, etc, as provided by the lecturer as an example, it can be resolved by tricks such as realigning the timestamp if necessary. The key takeaway should be whether there is a path from the same feature in the past states affecting the feature in discussion in the current state.
When I see doctors, they prescribe medicine based mostly on my current symptoms, most of the time anyway. I don’t know about you, but it certainly sounds reasonable to me.
More thoughts?
@@Bao_Lei Well I think the caveat is that the action is taking medicine. What she's arguing is that just because the state of your blood pressure is high, it doesn't mean that you should take medicine. It depends on whether your blood pressure is high at rest or your blood pressure is high from exercise. I know that the system is still Markov, since the next state only depends on the previous state, but I guess she's pushing that you shouldn't do a specific action based solely on the previous state, and have to take into account other things.
Is this or the Deep Reinforcment Learning Course from UC Berkeley better ?
It is making me confused how she explained Markov Assumption. "We're gonna assume that the state used by the agent is a sufficient statistic of the history and that in order to predict the future, you only need to know the current state of the environment. The future of independent of the past given the present".
What does that actually mean? Can someone simplify it? Thanks.
In Markov random processes, this is a basic assumption. state at t+1 only depends on state t or we can say given st, s_(t+1) is independent of s_(t-1),s_(t-2).....etc. google markov chains
Hi, can anybody please let me know what book can be a basis for this course?
@@taslas thanks
what is it ?
@@puskarwagle2392 Reinforcement Learning: An Introduction, by Richard Sutton and Andrew Barto
Any new !
The link to the course page doesn't work. Can anyone tell me where else I can get the slides?
take down notes
as a non-native speaker, i don`t get used to so flexible tone,sometimes tone is too high for me to recognize.
47:30:00
6:00
The more I hear, the more I confused... 😮
More than 50 comments? Wow!
Intended rewards are yet questionable.
7:18
This slide could have needed a little bit more explanation.
Nptel rl lectures are 💀 compared to this.
Does anyone have the feeling that the lecturer has a poor clarification on the notation she used.
me also
@@prosecurity8789 Joe MAma has poor clarifications
prerequisits.
Can you provide some examples ? I think it was very clear.
Yep.
Wow, the Stanford online? Is SBF a member of your club? Maybe you know where he is? Oh wait he went to MIT. I guess it's the sultry wood nymph that went to Stanford that I was hoping to hear about today in your lecture. I can hardly wait to hear what you have to say given the newfound credibility of Stanford grads in the crypto space. I'm super excited for another fancy box scheme. Or better yet maybe you can tell me some more about how I can get my blood tested by some super slick new machines that another Stanford grad came up with. WAY EXCITED!!
Envy is a hell of a drug.
Fir now it's deadly boring, but I will see the next ones
How does that come?
Were the other ones better than the first lecture?
@@dekroplay5373 I think I just listened to about half of this lecture
@@wiktorm9858 Didn't expected an answer.
Second half was a little more interesting. imo