Look at many people trying to explain this. even Berkeley lectures, this dude learned on his own and he explained soooo much better, thank you so much!!
I just finished the 5th lesson in this series. It is awesome! I am also reading Artificial Intelligence Engines: A Tutorial Introduction to the Mathematics of Deep Learning. The math just comes to life after seeing Colin's Python programs. Incredible series! Thanks
Great advice to thoroughly understand the vocabulary of a subject of interest first. It's similar to gathering information from users and figuring out the requirements of what a computer program should do before coding it.
I dont mean to be so off topic but does someone know a way to get back into an Instagram account?? I somehow forgot my account password. I would appreciate any tips you can give me.
Thank you so much for all the videos you've done for this topic!!! Appreciate it so much with the details and also walk-through of codes and also the tips in terms of how to study! Really helpful
10:45. I think you might have some error here. The equation is V(s) = max(R(s,a) + r*V(s')) i.e. V(s) max of the sum of the reward of the CURRENT cell (which is s) plus the V-value of its best NEIGHBOUR (which is s'). so the V-value of the cell left of the princess should be V = 0 + 0.9 * 1 = 0.9, not 1, right?
I thought so too at first, but then realized that R(s,a) is a function of s and a, the current state and action pair, not s' which is the next state. So the reward is given by taking the action, not being in the next state. Its a detail anyway but ill leave it for future people
I have some difficulties understanding something. I would be glad if anyone could help. Bellman equation basically states that to calculate the value of a state, we need to check [R(s,a) + gama*V(s')] for all possible actions in future and select the maximum one. In the Mario example we've directly placed the value=1 to the box near the princess. Because as a human we know the best possible action in this state is to take step towards the princess. But the only way for a computer to decide the best possible action is the try all possible actions and compare the results. Here is where the problem starts, there will be infinite loops (let's say in one infinite loop case, Mario could go back and forth forever) and the computer never will get the chance to compare results because it won't have all the results to make a comparison. What am I missing here? Thanks in advance.
In this example, we're starting from the base case, that is the final case when we are about to reach the princess. Since we are starting from the base case, we always have some value assigned to the cell, and we can just work our way back to the initial state. You're right in the sense that if we were to start from the initial state and explore from there, we could run into an infinite loop, but in the given example, we are starting from the base case.
Great video! Could you explain why you move to the left when calculating the values and not for example go down where V = 0.9. How would you calculate the value V = 0.9 left to the cell with reward -1?
I just started machine learning a couple days ago and it looks like the Bellman equation is part of the Q-learning. I would love if someone did a manual math of how to calculate the q-learning as the agent moves through the network. s - state is the square the agent is in s' - is the next square the agent will go to V - Value R(s,a) - reward for the state given an action y - penalty for going into the new state between 0 and 1
That means each action of the agent will give him a reward and if he finds the path or get to the goal then he will get maximum reward. But does an agent trys to find different ways to reach goals. Also i subscribed.
Exactly. I was wondering too why the bottom right corner cell, the V(s) there is 0.73??? I also think the way you calculated is more correct, so: V(s) = -1 + 0.9(1 + 0.9x0) = -1 + 0.9 = -0.1 Because at that state (at that cell) the only way you can move further is to get into the lava pit, even though it definitely is not the optimal option, but you've no other choice right? Can the TH-camr author help clarify here?
Is it like, because we are going from that state with particular action which will lead us to final reward of +1, we are taking the reward value of R(s,a) as 1.
11:42 you lost me there You said the state of the next square is 0 that's why it's 0 + but then you go ahead and say the value of the square is 1. are the state and value of the square different? I thought the state was the square?
I would say if it is a chess game. In a state, when your agent places a piece somewhere, and your opponent loose a piece, your agent get a positive reward. State is like current chessboard, action is where you put thr piece, reward can be defined by you.
What happens if instead of mario being drunk we had that Mario can move how instructed but the identity of the two rooms trap and success are unknown, but information is known of the probability that either is trap or success. i.e. mario knows that the success room may be trap with 50 % confidence
Look at many people trying to explain this. even Berkeley lectures, this dude learned on his own and he explained soooo much better, thank you so much!!
I'm studying at UC Berkeley and I agree 😭
The first time I a have a feeling of what Bellman’s equation is for. Awesome video.
I just finished the 5th lesson in this series. It is awesome! I am also reading Artificial Intelligence Engines: A Tutorial Introduction to the Mathematics of Deep Learning. The math just comes to life after seeing Colin's Python programs. Incredible series! Thanks
Where can we find the codes?
Great advice to thoroughly understand the vocabulary of a subject of interest first. It's similar to gathering information from users and figuring out the requirements of what a computer program should do before coding it.
Wow, amazingly done! Have searched over the entire internet to finally find this video and understand the idea behind this equation
🎈
🥰
I dont mean to be so off topic but does someone know a way to get back into an Instagram account??
I somehow forgot my account password. I would appreciate any tips you can give me.
A very positive start of the video thanks you, keep going and we keep supporting
This guy took this complicated formula and made it easy
I am working on machine learning and this was new to me. Thanks for bringing it to my attention.
Topic should be "Bellman Reinforcement Learning for dummies". Great job in explaining with steps.
The way you explain is great. It is very clear for me despite the fact that my English is not so good.
Awesome explanation.
Thanks Skowster, you're a real one!
Thank you so much for all the videos you've done for this topic!!! Appreciate it so much with the details and also walk-through of codes and also the tips in terms of how to study! Really helpful
I'm here from coursera, I'm having a bit of a trouble understanding them, thanks for the video!
me also
can u tell which course?
@@prateek6502-y4p pratical reinforcement learning course from national research university higher school of economics
your approach is amazing. thank You, good Sir.
Brilliantly explained 👍🏼👍🏼
You are a great teacher man! Mega Thanks...
10:45. I think you might have some error here.
The equation is V(s) = max(R(s,a) + r*V(s')) i.e. V(s) max of the sum of the reward of the CURRENT cell (which is s) plus the V-value of its best NEIGHBOUR (which is s').
so the V-value of the cell left of the princess should be V = 0 + 0.9 * 1 = 0.9, not 1, right?
I thought so too at first, but then realized that R(s,a) is a function of s and a, the current state and action pair, not s' which is the next state. So the reward is given by taking the action, not being in the next state. Its a detail anyway but ill leave it for future people
Great presentation- thank you
thank you very much sir 😍
Awsm explanation sir
Thanks for the "Quick Study Tips"!
Thanks that saved me from the day
So clear, thank you!
BEST EXPLANATION
very vivid. So the optimal step (action) to take is the direction to increase value the most (gradient direction).
isn't V==1 in the princess square (reward == 1 and no next state) and the one to the left == 0.9 (reward == 0 + gamma * 1) ?
Yup.
The value of the terminal state is always 0, not 1.
The value is calculated for rewards resulting from an action being taken. We are looking at the value of good decisions, not good initial placement.
it should be 0.9. R(s,a) and V(s) should be evaluated at the same s.
Thank you for this concise video. The texts provided by my lecturer wasn't this easy to understand
I have some difficulties understanding something. I would be glad if anyone could help.
Bellman equation basically states that to calculate the value of a state, we need to check [R(s,a) + gama*V(s')] for all possible actions in future and select the maximum one. In the Mario example we've directly placed the value=1 to the box near the princess. Because as a human we know the best possible action in this state is to take step towards the princess. But the only way for a computer to decide the best possible action is the try all possible actions and compare the results. Here is where the problem starts, there will be infinite loops (let's say in one infinite loop case, Mario could go back and forth forever) and the computer never will get the chance to compare results because it won't have all the results to make a comparison.
What am I missing here? Thanks in advance.
In this example, we're starting from the base case, that is the final case when we are about to reach the princess. Since we are starting from the base case, we always have some value assigned to the cell, and we can just work our way back to the initial state. You're right in the sense that if we were to start from the initial state and explore from there, we could run into an infinite loop, but in the given example, we are starting from the base case.
Simple and clear❤️
Thank you for this
This is great! And fun!!! Thank you!!
I appreciate your work
Thanks You Skowster
Great video! Could you explain why you move to the left when calculating the values and not for example go down where V = 0.9. How would you calculate the value V = 0.9 left to the cell with reward -1?
as far I have understood, he used the value in 1st row, 3rd column, he did not use the value of the -1
thumbs up for the quake reference :)
Thank you so much, Sir.
The link to the "free Move 37 Reinforcement Learning course" mentioned in the description appears to be dead.
3:44 It's fascinating to note that in 1954 there were no digital computers. I wonder if they were using analog computers.
I just started machine learning a couple days ago and it looks like the Bellman equation is part of the Q-learning. I would love if someone did a manual math of how to calculate the q-learning as the agent moves through the network.
s - state is the square the agent is in
s' - is the next square the agent will go to
V - Value
R(s,a) - reward for the state given an action
y - penalty for going into the new state between 0 and 1
kenn just started today , is there any discord server for beginners?
Awesome video!
This series is amazing.
Is the Move 37 course currently available? i am having a hard time finding it
It really helps me.
really enjoyed
That means each action of the agent will give him a reward and if he finds the path or get to the goal then he will get maximum reward. But does an agent trys to find different ways to reach goals.
Also i subscribed.
brilliant
thanks alot mannnnnnn
i have a question, how can i decide the actions to take before going backwards to extimate values?
The example looks very familiar to the Georgia Tech's Reinforcement Learning course's example
I have a question about the step before going to the fire. R(s, a) of fire is -1. The step before, shouldn't be the V value -1 + 0.9 = -0.1?
Exactly. I was wondering too why the bottom right corner cell, the V(s) there is 0.73??? I also think the way you calculated is more correct, so:
V(s) = -1 + 0.9(1 + 0.9x0) = -1 + 0.9 = -0.1
Because at that state (at that cell) the only way you can move further is to get into the lava pit, even though it definitely is not the optimal option, but you've no other choice right? Can the TH-camr author help clarify here?
@@MaiMaxTeamIELTSTAbecause we get the maxium value of the state
Is it like, because we are going from that state with particular action which will lead us to final reward of +1, we are taking the reward value of R(s,a) as 1.
11:42
you lost me there
You said the state of the next square is 0 that's why it's 0 +
but then you go ahead and say the value of the square is 1.
are the state and value of the square different?
I thought the state was the square?
Helpful thank you
I think for reward you are talking about current state.
So how could it be +1 when you are starting backward?
How is the 3rd row, 4th column value coming as 0.73? Can anyone please explain? Please
Because the next possible moves are either -1 or 0,81.
So the best move is to go to 0,81, multiplied by the gamma factor of 0,9.
.81 x .9 is .73 :)
super
Great tutorial!
Where do you get that reward function? The R(s, a)?
I would say if it is a chess game. In a state, when your agent places a piece somewhere, and your opponent loose a piece, your agent get a positive reward. State is like current chessboard, action is where you put thr piece, reward can be defined by you.
Explanation help me, but calculations are wrong. Princes is V=1 and square next to prices should be V=0.9.
you make me god ty.
I mean for the second last cell
What happens if instead of mario being drunk we had that Mario can move how instructed but the identity of the two rooms trap and success are unknown, but information is known of the probability that either is trap or success. i.e. mario knows that the success room may be trap with 50 % confidence
in my calculation s3 should be 0.9 , since Q(s3 ,a)=0+0.9×max(0,1,0) from here s2 =0.81 s1 = 0.729 , .....
21.11 20:55
Great content, terrible microphone
Las Vegas and AI? What an oxymoron.
Using gaming analogies and sexist trope distracts. Also unnecessary info about how to learn.
you are great at explaining things, but why use games and rescuing a princess as examples? smh, be more open-minded about your audience