This guy was my lecturer about 10 years ago. He was very down to earth and explained the concepts in a really friendly way. Glad to see he's still doing it.
Glad to see Nick here, he definitely provided some of the clearest and most interesting explanations throughout my degree. As well as setting us loose with a lot of Lego robots and watching chaos ensue.
It also fail spectacularly from time to time. For instance the so-called "sunk cost fallacy" might make you stay at the train station for much too long. You've already invested so much time into waiting for the train, you don't want this time to go to waste. The fallacy is that the time spent waiting is not an investment. It's a pure loss.
The key is we don't always make the best choice. For example, if you're choosing a path to work as in this example, you may not make the best choice but it doesn't matter.
Just took a RL course. Bellman equation and Markovian assumptions are so familiar. Btw, for those who are interested, the algorithm to solve discrete MDP (or model based RL problems in general) are Value Iterations and Policy Iterations, which are all based on Bellman equation.
I made these decisions for my real commute. The train was fastest, but occasionally much longer. The car was fast, but the cost of parking equalled 2 hours of work, so was effectively slowest. The latest I could leave and be sure of being on time was walking.
such an intuitive discussion! i particularly love the 'wait for train for three rounds then go back home to get the bike' part. that really clicked with me there. thank you so much for making this!
There is a 3% chance that, somewhere along the route, there's a half-duplex roadblock because they're fixing the overhead wires or something. There's a 0.1% chance that a power line or tree fell across the road, forcing you to take an extremely long detour, but half of the time this happens, you could get past it on a bike.
I'd like an autonomous taxi system that would decide it's all too hard to take me to the office, and would just take me back home, or, indeed, just refuse to take me to the office. "Sorry, I"m working from home today because the car refused to drive itself."
This is such a fascinating breakdown of Markov decision making. I love the mathematics that underpins Markov, but the creativity and imagination applied to the example and its host of solutions are delicious brain food.
MDP is the topic of my bachelorthesis and the example really helped understanding everything a lot better and I think I'll be using it throughout the thesis to understand the theory I have to write about. It's a lot easier to understand than some state a,b and c and action 1,2,3 :D
You shouldn't be afraid to ask the teacher, "Okay, explain that one more time..." So they get a chance at a better, cleaner more polished bits to put in the video.
I've unconciously done something similar with my commute to work. I can take the subway or the bus. The subway usually always takes the same amount of time every time, but there's a longer walk and rarely there's signaling issues that may force me to take the bus anyways. During winter, the bus may have problems with getting stuck in the snowy hills, and then I'm forced to take a taxi. The bus also has a connection that I will sometimes barely miss, so I may need to wait either ~1 minute or ~15 minutes for the next one. But one upside is, if the connecting bus takes too long, or never comes, I'm pretty close to work already so I could walk the rest of the way in a pinch. The biggest problem is, I have no idea how to assign the right probabilities to each of those events. There's just not enough data (that I have access to at least). Usually, I just take the bus to work (less walking, and don't have to deal with signaling issues), and the subway home (to avoid the connecting bus). If nothing goes wrong, they are pretty similar in time.
16:17 if you're allowed to remember how many cycles you waited for the train, does this mean you lose the Markov property? Or does the Markov property relate to the environment rather than your decision?
Looking up on Wikipedia it seems like they define the policy to only take the current state rather than current state + reward. Granted, you can always augment the state space to include each possible wait for the train at some specific amount of time on the clock and make it markovian, but the example they made does violate the markovian property if the nodes described are the states.
Once the AI works well enough it puts the bike in the car and if noticed the traffic is high then takes the bike out and travel just the rest by bike. Next option use the bike to go to the train station and if the train is not coming directly switch to the bike.
Very interesting video. What about adding multiple criterias to the model. For instance: time, money in the model about commuting. Is there a software that can help you created and solve these types of Multiple criteria stochastic decision making problems? Something like Enterprise Dynamics, a discrete event simulation software platform
So is there a way to compute the solutions? Like I assume some matrices show up. One for probabilities and one for the sum of times. Then you can multiply it and get different time distributions for every strategy?
There's no point in an edge going home from the railway station because having been at the railway station does not change the stochastic costs of the other options. Once you've decided the rail has the lowest stochastic cost, you're done. Now if we add a concept of traffic changing with time, then we have a higher-order model and the edge becomes pointful again.
I have a problem called Facilities Layout Problem which I am trying to solve it using Reinforcement Learning. The initial state is a layout that has a cost and the goal is chnage the facilities layout in order to minmize the cost. My question do this problem should be treated episodically or continously and what to do in the case where there is no absorbing state?? I would be extremely happy if someone could help.
If episodicaly or continously depends on the beginning state of the 'system' each action is a episode but it is possible to have the optimal by chance. Without break a loop might follow.
This is great, except you got the percentages for traffic probability wrong. Light traffic is 10%, medium traffic is 20% and heavy traffic is 70% of the time.
Fascinating and useful overview. I've watched a few machine learning lectures, it intrigues me that the logic, theory, mechanics etc is (at this 101 level) identical to decision theory that any human should - could - would use to live their lives efficiently... But never do! Because we were never taught how. So I bet even the scientists who program their AI for some corporate exploitative system (probably), ironically waste their life taking dumb decisions every day... And the example given of commuting to work is the classic First World Problem... Like gamblers we all think we know how to game the system, but by playing it we have ALREADY LOST. Did I just invent computational philosophy?? Per the reboot movie Tron - .
Its like if you are already late just take the bus bit if you have time according to Google maps take the fastest route Otherwise take the simplest route you have time for. (With least changes and walking)
in the end the bayesian decision is the same as the strict algorithm, but implementation is wildly different and cleanness/interpretation of the algorithm can be clear/fuzzy (same problem, different paths, between step partial results, end result as logged)
These shortest path algorithms convinced lions that whoever designs these algorithms is a lot smarter than a lion, spent an entire career designing just 1 algorithm, & it's pointless to try to remember them all.
I have to go to the bank and trust me I will be there in about the time of the year is starting to stir fry sauce instead of garlic on the way home now anyway I think I have a few things to do in the morning. There's predictive text models at work. Start with "I " and keep hammering the predicted word and see what comes out. 😁.
This video presents a problem, names a solution, doesn't present the named solution, then just ends. The whole video can be summed up as "in computer science sequential decisions with probable outcomes are made by using some approach, the approach requires some conditions to be determined for a desired outcome". IT NEVER SHOWS A SOLUTION, IT JUST SAYS THERE IS ONE. WHAT'S THE POINT?
*Self driving car.* Bike swerves in front. Action? 1. Brake hard, but can you stop in time?. 2. Swerve left, but what about that little kid? 3. Swerve right and hit incoming traffic, maybe killing many more people? Humans are very bad when faced with uncertainties like that. Machines would be no better.
Remarkable work! This content is fantastic. I found something similar, and it was beyond words. "Game Theory and the Pursuit of Algorithmic Fairness" by Jack Frostwell
No one understand how trains work in this video. The infographic makes the train jitter on his route and no one ever heard of train schedules. We should also factor cost. The risk of accident, the health benefit, the capacity to read your email in the train...
The graphic illustrates that the route goes via somewhere else... (Unrealistic route for the timings but inspiration taken from my route from Nottingham to Oxford to meet Nick) HTH -Sean
@@Computerphile Ah sorry ! That make sens. You turned a 150 minutes train ride in a 30 minutes train ride and I found the ride quite bumpy. Thanks for wasting the time to reply to me
To all the people in the commentz-- No, he doesn't look 'wierd' or 'wrong', he has a lazy eye or similar condition. These conditions are common and normal. Shame on you.
there are no decisions. there are choices. and all are random. if the parameters are obscure. just like us we are biological machines we know rules but we chose as we please.
I really love all your videos, but I cant stand the sound of marker pen against the paper. That kinda hiss sound irritates to my core. I might be the only one in the world, but my brain is programmed that way. Can you please remove that sound or use a different ball point or other pen ? I have to hold my earphones far when you start writing. Please consider this.
I was waiting for 17 minutes for him to actually solve the problem using the algorithm, yet he never got the the point, only babbled about the same thing over and over again. Big dislike.
This guy was my lecturer about 10 years ago. He was very down to earth and explained the concepts in a really friendly way. Glad to see he's still doing it.
We might have crossed paths at uni of bham
@@centcode was there 2012-2015
Glad to see Nick here, he definitely provided some of the clearest and most interesting explanations throughout my degree. As well as setting us loose with a lot of Lego robots and watching chaos ensue.
@@Mounta1ngoat lol that sounds great
@@Deathhead68 it sounds great 👍
This channel makes me appreciate the human brain more. We do all that automatically with barely a moment's thought.
It also fail spectacularly from time to time.
For instance the so-called "sunk cost fallacy" might make you stay at the train station for much too long. You've already invested so much time into waiting for the train, you don't want this time to go to waste.
The fallacy is that the time spent waiting is not an investment. It's a pure loss.
which causes ALL the Problems
we create
and we get ever more creative
The key is we don't always make the best choice. For example, if you're choosing a path to work as in this example, you may not make the best choice but it doesn't matter.
@@real_mikkim and with all this computation, it still manages to fall for the most basic fallacies.
I'm very much unimpressed.
OMG as a Robotics student, I'm amazed how well explained that is. Love it
Just took a RL course. Bellman equation and Markovian assumptions are so familiar. Btw, for those who are interested, the algorithm to solve discrete MDP (or model based RL problems in general) are Value Iterations and Policy Iterations, which are all based on Bellman equation.
Where the formal definitions for concepts like MDP can get overwhelming , it really helps to have these easy to understand explanations
Nice one, I met Professor Nick at Pembroke College Oxford. It was an honour.
I made these decisions for my real commute. The train was fastest, but occasionally much longer. The car was fast, but the cost of parking equalled 2 hours of work, so was effectively slowest. The latest I could leave and be sure of being on time was walking.
such an intuitive discussion! i particularly love the 'wait for train for three rounds then go back home to get the bike' part. that really clicked with me there. thank you so much for making this!
Please, bring more from this guy
There is a 3% chance that, somewhere along the route, there's a half-duplex roadblock because they're fixing the overhead wires or something. There's a 0.1% chance that a power line or tree fell across the road, forcing you to take an extremely long detour, but half of the time this happens, you could get past it on a bike.
I'd like an autonomous taxi system that would decide it's all too hard to take me to the office, and would just take me back home, or, indeed, just refuse to take me to the office.
"Sorry, I"m working from home today because the car refused to drive itself."
"My robot ate my transportation, boss, there was nothing I could do *except* put my comfy PJs back on."
@@IceMetalPunk Sounds legit, take the rest of the week off.
You can read passion in every word he is pronouncing. Very good explanation.
I heared a lot about MDP and policy functions in the context of reinforcement learning. But this is the best explanation I ever heared.
This is such a fascinating breakdown of Markov decision making. I love the mathematics that underpins Markov, but the creativity and imagination applied to the example and its host of solutions are delicious brain food.
MDP is the topic of my bachelorthesis and the example really helped understanding everything a lot better and I think I'll be using it throughout the thesis to understand the theory I have to write about. It's a lot easier to understand than some state a,b and c and action 1,2,3 :D
I literally had my final year project use a kalman filter to solve this problem. That's awesome!
Edit: spelling
Kalman
This was a fantastic simple explanation, very enlightening.
I rarely put a like on a video, but this one deserves it.
I definitely want to hear more about the algorithms to solve MDP problems.
You shouldn't be afraid to ask the teacher, "Okay, explain that one more time..." So they get a chance at a better, cleaner more polished bits to put in the video.
I've unconciously done something similar with my commute to work. I can take the subway or the bus. The subway usually always takes the same amount of time every time, but there's a longer walk and rarely there's signaling issues that may force me to take the bus anyways. During winter, the bus may have problems with getting stuck in the snowy hills, and then I'm forced to take a taxi. The bus also has a connection that I will sometimes barely miss, so I may need to wait either ~1 minute or ~15 minutes for the next one. But one upside is, if the connecting bus takes too long, or never comes, I'm pretty close to work already so I could walk the rest of the way in a pinch.
The biggest problem is, I have no idea how to assign the right probabilities to each of those events. There's just not enough data (that I have access to at least). Usually, I just take the bus to work (less walking, and don't have to deal with signaling issues), and the subway home (to avoid the connecting bus). If nothing goes wrong, they are pretty similar in time.
16:17 if you're allowed to remember how many cycles you waited for the train, does this mean you lose the Markov property? Or does the Markov property relate to the environment rather than your decision?
Looking up on Wikipedia it seems like they define the policy to only take the current state rather than current state + reward.
Granted, you can always augment the state space to include each possible wait for the train at some specific amount of time on the clock and make it markovian, but the example they made does violate the markovian property if the nodes described are the states.
the best explanation of this I've ever heard. many thanks.
Brady will you please find someone to interview about chess engines/chess programming/neural nets. That would be super interesting
This interviewer isn't Brady. Says in the description: "This video was filmed and edited by Sean Riley."
image stabilization would be nice
Once the AI works well enough it puts the bike in the car and if noticed the traffic is high then takes the bike out and travel just the rest by bike.
Next option use the bike to go to the train station and if the train is not coming directly switch to the bike.
What sort of paper is being used for the diagrams?
The timing of this video! I am currently trying to work on a project that uses this in my AI class
Fascination look into decision-making.
@MarkovBaj any thoughts?
So... next video gonna be POMDP?
Very interesting video. What about adding multiple criterias to the model. For instance: time, money in the model about commuting. Is there a software that can help you created and solve these types of Multiple criteria stochastic decision making problems? Something like Enterprise Dynamics, a discrete event simulation software platform
great video! Really well explained and interesting
Another perfect Video. Thanks for that! But I'm still asking myself... Will this continuous printing ever run out ??? :D
So is there a way to compute the solutions? Like I assume some matrices show up. One for probabilities and one for the sum of times. Then you can multiply it and get different time distributions for every strategy?
The bike can also take longer than 60 minutes. Flat tires, catastrophic mechanical failure, getting hit by another vehicle, etc.
True but it's much more within your control
We are getting out of the hood with this one.
That paper takes me back!
Am I correct to assume that a first-order Markov system is similar to frequentist statistical models as a methodology?
Are the policies analogous to a reward function in a neural network?
There's no point in an edge going home from the railway station because having been at the railway station does not change the stochastic costs of the other options. Once you've decided the rail has the lowest stochastic cost, you're done. Now if we add a concept of traffic changing with time, then we have a higher-order model and the edge becomes pointful again.
I have a problem called Facilities Layout Problem which I am trying to solve it using Reinforcement Learning. The initial state is a layout that has a cost and the goal is chnage the facilities layout in order to minmize the cost. My question do this problem should be treated episodically or continously and what to do in the case where there is no absorbing state?? I would be extremely happy if someone could help.
If episodicaly or continously depends on the beginning state of the 'system' each action is a episode but it is possible to have the optimal by chance. Without break a loop might follow.
How does the algorithm work with imperfect information game like poker? Can you apply it to poker?
Can anyone pls tell me where did he get his watch from.
wow probabilistic computing is kinda interesting. can u do a video on physical unclonable functions? I need an explainer like this XD
Do robots cue up ?
This is great, except you got the percentages for traffic probability wrong. Light traffic is 10%, medium traffic is 20% and heavy traffic is 70% of the time.
Very nice.
Let's see Paul Allen's Markov chain.
What's the watch model he's wearing?
Fascinating and useful overview.
I've watched a few machine learning lectures, it intrigues me that the logic, theory, mechanics etc is (at this 101 level) identical to decision theory that any human should - could - would use to live their lives efficiently... But never do! Because we were never taught how.
So I bet even the scientists who program their AI for some corporate exploitative system (probably), ironically waste their life taking dumb decisions every day...
And the example given of commuting to work is the classic First World Problem... Like gamblers we all think we know how to game the system, but by playing it we have ALREADY LOST.
Did I just invent computational philosophy??
Per the reboot movie Tron - .
Excellent!!! Cheers.
This is exactly what AI assistants should allow us to do - apply mathematical analysis to real world problems, in real-time.
More!
Its like if you are already late just take the bus bit if you have time according to Google maps take the fastest route
Otherwise take the simplest route you have time for. (With least changes and walking)
It really bothers me that he's waving the pen around without the lid on
make the difference/similarity of strict algorithm and fuzzy probabilistic selection algorithm clear
in the end the bayesian decision is the same as the strict algorithm, but implementation is wildly different and cleanness/interpretation of the algorithm can be clear/fuzzy (same problem, different paths, between step partial results, end result as logged)
fuzzy probabilistic ai vs dijkstra for shortest path
all algorithms give same kinds of answers for same problem but in different logical/math ways
describe dijkstra/A* in infinite memory probabilistic state algorithm
an algorithm might decide on fly while training if it remembers previous states or not
2022. Still using tractor feed printer paper as scrap.
Things to have as a computer scientist, a marker and paper. 😮
Phenomenal
All the respect
These shortest path algorithms convinced lions that whoever designs these algorithms is a lot smarter than a lion, spent an entire career designing just 1 algorithm, & it's pointless to try to remember them all.
You might also have a soft deadline for arriving to work so for example as long as your late only 1% of the time
Extremely nice
Coo coo cachoo the probability depends on you!
I have to go to the bank and trust me I will be there in about the time of the year is starting to stir fry sauce instead of garlic on the way home now anyway I think I have a few things to do in the morning.
There's predictive text models at work. Start with "I " and keep hammering the predicted word and see what comes out. 😁.
This video presents a problem, names a solution, doesn't present the named solution, then just ends. The whole video can be summed up as "in computer science sequential decisions with probable outcomes are made by using some approach, the approach requires some conditions to be determined for a desired outcome". IT NEVER SHOWS A SOLUTION, IT JUST SAYS THERE IS ONE. WHAT'S THE POINT?
*Self driving car.* Bike swerves in front. Action? 1. Brake hard, but can you stop in time?. 2. Swerve left, but what about that little kid? 3. Swerve right and hit incoming traffic, maybe killing many more people?
Humans are very bad when faced with uncertainties like that. Machines would be no better.
im too doing reinforcement learning now
I watched the film "The Mist" (2007) last night, it seems like "David" could have used a little "help" with this kind of decision making in the end.
this is great
Remarkable work! This content is fantastic. I found something similar, and it was beyond words. "Game Theory and the Pursuit of Algorithmic Fairness" by Jack Frostwell
Thanks
“How to guide AI to draw 5 fingers instead of forcing it”
or use chopstick to eat noodles
or bake a cake
Lets go skynet! ..... Lets go skynet! Long live the robot overlords.
No one understand how trains work in this video. The infographic makes the train jitter on his route and no one ever heard of train schedules.
We should also factor cost. The risk of accident, the health benefit, the capacity to read your email in the train...
The graphic illustrates that the route goes via somewhere else... (Unrealistic route for the timings but inspiration taken from my route from Nottingham to Oxford to meet Nick) HTH -Sean
@@Computerphile Ah sorry ! That make sens. You turned a 150 minutes train ride in a 30 minutes train ride and I found the ride quite bumpy. Thanks for wasting the time to reply to me
You're welcome :0)
To all the people in the commentz-- No, he doesn't look 'wierd' or 'wrong', he has a lazy eye or similar condition. These conditions are common and normal. Shame on you.
question - can you try putting the pen on your nose and then staring at it. I want to test something
Always take the bike
there are no decisions. there are choices. and all are random. if the parameters are obscure. just like us we are biological machines we know rules but we chose as we please.
My Sunday ...a chameleon is teaching me about robot decisions ...I'm trippin bro xD
I really love all your videos, but I cant stand the sound of marker pen against the paper. That kinda hiss sound irritates to my core. I might be the only one in the world, but my brain is programmed that way. Can you please remove that sound or use a different ball point or other pen ? I have to hold my earphones far when you start writing. Please consider this.
What happened to him
The paper's voice is so bad .please use white board for next videos
So there is a scientific theory behind why i prefer cycling 😂
I was waiting for 17 minutes for him to actually solve the problem using the algorithm, yet he never got the the point, only babbled about the same thing over and over again. Big dislike.
REPORTED NOT COMPUTER RELATED