Stochastic Games and Multiagent RL - Georgia Tech - Machine Learning
ฝัง
- เผยแพร่เมื่อ 29 ต.ค. 2024
- Watch on Udacity: www.udacity.co...
Check out the full Advanced Operating Systems course for free at: www.udacity.co...
Georgia Tech online Master's program: www.udacity.co...
This video does not discuss the Multiagent RL side. Only introduces an example.
This video actually misrepresent what Nash equilibrium is . The pure Nash equilibrium strategy is going through W-N-N. This is similar to Prisoners Dilemma which ends up bad for both prisoners following Nash Equilibrium strategy.
How to get 2/3 (from Mark Nowicki):
50% of the time A gets through. 50% of the time it doesn't. The game only ends 25% of the time when B gets through and A doesn't. The scenario where they both don't get through means you start over.
So what's the probability of both of you not getting through and then starting over and then A getting through? 25% * 50%. Well, now you have another 25% neither of them get through. This keeps going forever.
If you think of the first time you got through (the original .5) as .5 * .25*0, you can now model the rewards as
R = sum from 0 to inf of (.5 * .25^t)
pull out the constant .5
sum 0 to inf of (.25^t) = 1 / (1-.25) = 1/ .75 = 4/3
Multiply 4/3 by the constant .5 = 2/3 expected reward
super helpful
i actually dont see it with the same manner,
supposing that P is the probability that A gets the reward, as they said we have 4 cases :
Both get the reward or A gets the reward or B gets the reward or both don't pass the semi wall and the probality that A gets the reward is the same, so for me it's
P = 0.25 + 0.25 + 0.25 * P
0.75 P = 0.5 -> P = (2/3) the execpted reward.
The guy above is right, and the video gets it 100% wrong. The Nash equilibrium is both players going to the middle, because going to the middle has the same expected payoff as going up if the other player goes to the middle, and the payoff for going to the middle is higher when the other player chooses to go up. This is similar to the prisoners dilemma, because both players willingly choose a “sub-optimal” solution. This video is complete trash because it’s 100% wrong. Only confuses an already difficult subject. It should be taken down immediately before it corrupts more inquisitive mind.
❤