Markov Decision Processes - Computerphile

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 ต.ค. 2022
  • Deterministic route finding isn't enough for the real world - Nick Hawes of the Oxford Robotics Institute takes us through some problems featuring probabilities.
    Nick used an example from Mickael Randour: bit.ly/C_MickaelRandour
    This video was previously called "Robot Decision Making"
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

ความคิดเห็น • 118

  • @Deathhead68
    @Deathhead68 ปีที่แล้ว +326

    This guy was my lecturer about 10 years ago. He was very down to earth and explained the concepts in a really friendly way. Glad to see he's still doing it.

    • @centcode
      @centcode ปีที่แล้ว +2

      We might have crossed paths at uni of bham

    • @Deathhead68
      @Deathhead68 ปีที่แล้ว

      @@centcode was there 2012-2015

    • @Mounta1ngoat
      @Mounta1ngoat ปีที่แล้ว +4

      Glad to see Nick here, he definitely provided some of the clearest and most interesting explanations throughout my degree. As well as setting us loose with a lot of Lego robots and watching chaos ensue.

    • @erazn9077
      @erazn9077 ปีที่แล้ว

      @@Mounta1ngoat lol that sounds great

    • @symonkanulah3809
      @symonkanulah3809 ปีที่แล้ว

      @@Deathhead68 it sounds great 👍

  • @CalvinHikes
    @CalvinHikes ปีที่แล้ว +163

    This channel makes me appreciate the human brain more. We do all that automatically with barely a moment's thought.

    • @Ceelvain
      @Ceelvain ปีที่แล้ว +29

      It also fail spectacularly from time to time.
      For instance the so-called "sunk cost fallacy" might make you stay at the train station for much too long. You've already invested so much time into waiting for the train, you don't want this time to go to waste.
      The fallacy is that the time spent waiting is not an investment. It's a pure loss.

    • @raginald7mars408
      @raginald7mars408 ปีที่แล้ว +2

      which causes ALL the Problems
      we create
      and we get ever more creative

    • @GizmoMaltese
      @GizmoMaltese ปีที่แล้ว

      The key is we don't always make the best choice. For example, if you're choosing a path to work as in this example, you may not make the best choice but it doesn't matter.

    • @Ceelvain
      @Ceelvain ปีที่แล้ว

      @@real_mikkim and with all this computation, it still manages to fall for the most basic fallacies.
      I'm very much unimpressed.

  • @tlxyxl8524
    @tlxyxl8524 ปีที่แล้ว +17

    Just took a RL course. Bellman equation and Markovian assumptions are so familiar. Btw, for those who are interested, the algorithm to solve discrete MDP (or model based RL problems in general) are Value Iterations and Policy Iterations, which are all based on Bellman equation.

  • @mateuszdziezok8631
    @mateuszdziezok8631 ปีที่แล้ว +43

    OMG as a Robotics student, I'm amazed how well explained that is. Love it

  • @SachinVerma-lx5bx
    @SachinVerma-lx5bx ปีที่แล้ว +8

    Where the formal definitions for concepts like MDP can get overwhelming , it really helps to have these easy to understand explanations

  • @gasdive
    @gasdive ปีที่แล้ว +23

    I made these decisions for my real commute. The train was fastest, but occasionally much longer. The car was fast, but the cost of parking equalled 2 hours of work, so was effectively slowest. The latest I could leave and be sure of being on time was walking.

  • @engineeringmadeasy
    @engineeringmadeasy ปีที่แล้ว +4

    Nice one, I met Professor Nick at Pembroke College Oxford. It was an honour.

  • @blacklabelmansociety
    @blacklabelmansociety ปีที่แล้ว +13

    Please, bring more from this guy

  • @tobiaswegener1234
    @tobiaswegener1234 ปีที่แล้ว +7

    This was a fantastic simple explanation, very enlightening.

  • @Ceelvain
    @Ceelvain ปีที่แล้ว +2

    I heared a lot about MDP and policy functions in the context of reinforcement learning. But this is the best explanation I ever heared.

  • @pierreabbat6157
    @pierreabbat6157 ปีที่แล้ว +14

    There is a 3% chance that, somewhere along the route, there's a half-duplex roadblock because they're fixing the overhead wires or something. There's a 0.1% chance that a power line or tree fell across the road, forcing you to take an extremely long detour, but half of the time this happens, you could get past it on a bike.

  • @yvesamevoin8720
    @yvesamevoin8720 ปีที่แล้ว +3

    You can read passion in every word he is pronouncing. Very good explanation.

  • @Ceelvain
    @Ceelvain ปีที่แล้ว +6

    I rarely put a like on a video, but this one deserves it.
    I definitely want to hear more about the algorithms to solve MDP problems.

  • @BobWaist
    @BobWaist 8 หลายเดือนก่อน

    great video! Really well explained and interesting

  • @cerealport2726
    @cerealport2726 ปีที่แล้ว +11

    I'd like an autonomous taxi system that would decide it's all too hard to take me to the office, and would just take me back home, or, indeed, just refuse to take me to the office.
    "Sorry, I"m working from home today because the car refused to drive itself."

    • @IceMetalPunk
      @IceMetalPunk ปีที่แล้ว +4

      "My robot ate my transportation, boss, there was nothing I could do *except* put my comfy PJs back on."

    • @cerealport2726
      @cerealport2726 ปีที่แล้ว +3

      @@IceMetalPunk Sounds legit, take the rest of the week off.

  • @phil9447
    @phil9447 ปีที่แล้ว +6

    MDP is the topic of my bachelorthesis and the example really helped understanding everything a lot better and I think I'll be using it throughout the thesis to understand the theory I have to write about. It's a lot easier to understand than some state a,b and c and action 1,2,3 :D

  • @tristanlouthrobins
    @tristanlouthrobins หลายเดือนก่อน

    This is such a fascinating breakdown of Markov decision making. I love the mathematics that underpins Markov, but the creativity and imagination applied to the example and its host of solutions are delicious brain food.

  • @elwood.downey
    @elwood.downey ปีที่แล้ว

    the best explanation of this I've ever heard. many thanks.

  • @asfandiyar5829
    @asfandiyar5829 ปีที่แล้ว +14

    I literally had my final year project use a kalman filter to solve this problem. That's awesome!
    Edit: spelling

  • @lucrainville4372
    @lucrainville4372 ปีที่แล้ว

    Fascination look into decision-making.

  • @Imevul
    @Imevul ปีที่แล้ว

    I've unconciously done something similar with my commute to work. I can take the subway or the bus. The subway usually always takes the same amount of time every time, but there's a longer walk and rarely there's signaling issues that may force me to take the bus anyways. During winter, the bus may have problems with getting stuck in the snowy hills, and then I'm forced to take a taxi. The bus also has a connection that I will sometimes barely miss, so I may need to wait either ~1 minute or ~15 minutes for the next one. But one upside is, if the connecting bus takes too long, or never comes, I'm pretty close to work already so I could walk the rest of the way in a pinch.
    The biggest problem is, I have no idea how to assign the right probabilities to each of those events. There's just not enough data (that I have access to at least). Usually, I just take the bus to work (less walking, and don't have to deal with signaling issues), and the subway home (to avoid the connecting bus). If nothing goes wrong, they are pretty similar in time.

  • @spyboyb321
    @spyboyb321 ปีที่แล้ว +1

    The timing of this video! I am currently trying to work on a project that uses this in my AI class

  • @khaledsrrr
    @khaledsrrr 4 หลายเดือนก่อน

    Phenomenal
    All the respect

  • @TGUGCL
    @TGUGCL ปีที่แล้ว +1

    Very interesting video. What about adding multiple criterias to the model. For instance: time, money in the model about commuting. Is there a software that can help you created and solve these types of Multiple criteria stochastic decision making problems? Something like Enterprise Dynamics, a discrete event simulation software platform

  • @GBlunted
    @GBlunted ปีที่แล้ว +2

    You shouldn't be afraid to ask the teacher, "Okay, explain that one more time..." So they get a chance at a better, cleaner more polished bits to put in the video.

  • @WalkerRacing
    @WalkerRacing ปีที่แล้ว +4

    Brady will you please find someone to interview about chess engines/chess programming/neural nets. That would be super interesting

    • @ideallyyours
      @ideallyyours ปีที่แล้ว +2

      This interviewer isn't Brady. Says in the description: "This video was filmed and edited by Sean Riley."

  • @SystemSh0cker
    @SystemSh0cker ปีที่แล้ว +1

    Another perfect Video. Thanks for that! But I'm still asking myself... Will this continuous printing ever run out ??? :D

  • @Techmagus76
    @Techmagus76 ปีที่แล้ว +3

    Once the AI works well enough it puts the bike in the car and if noticed the traffic is high then takes the bike out and travel just the rest by bike.
    Next option use the bike to go to the train station and if the train is not coming directly switch to the bike.

  • @Leon-pu3vm
    @Leon-pu3vm ปีที่แล้ว

    Extremely nice

  • @vsandu
    @vsandu ปีที่แล้ว

    Excellent!!! Cheers.

  • @LukaszWiklendt
    @LukaszWiklendt ปีที่แล้ว +3

    16:17 if you're allowed to remember how many cycles you waited for the train, does this mean you lose the Markov property? Or does the Markov property relate to the environment rather than your decision?

    • @mgostIH
      @mgostIH ปีที่แล้ว

      Looking up on Wikipedia it seems like they define the policy to only take the current state rather than current state + reward.
      Granted, you can always augment the state space to include each possible wait for the train at some specific amount of time on the clock and make it markovian, but the example they made does violate the markovian property if the nodes described are the states.

  • @rd42537
    @rd42537 ปีที่แล้ว

    That paper takes me back!

  • @Veptis
    @Veptis ปีที่แล้ว

    So is there a way to compute the solutions? Like I assume some matrices show up. One for probabilities and one for the sum of times. Then you can multiply it and get different time distributions for every strategy?

  • @patrickbateman455
    @patrickbateman455 ปีที่แล้ว +6

    Very nice.

    • @bigprovola
      @bigprovola ปีที่แล้ว +6

      Let's see Paul Allen's Markov chain.

  • @firsttyrell6484
    @firsttyrell6484 ปีที่แล้ว +4

    image stabilization would be nice

  • @timng9104
    @timng9104 ปีที่แล้ว +1

    wow probabilistic computing is kinda interesting. can u do a video on physical unclonable functions? I need an explainer like this XD

  • @samt2226
    @samt2226 ปีที่แล้ว +2

    What sort of paper is being used for the diagrams?

  • @chipsafan1
    @chipsafan1 ปีที่แล้ว

    Am I correct to assume that a first-order Markov system is similar to frequentist statistical models as a methodology?

  • @jasontrunk3353
    @jasontrunk3353 ปีที่แล้ว

    this is great

  • @opusdei1151
    @opusdei1151 ปีที่แล้ว

    How does the algorithm work with imperfect information game like poker? Can you apply it to poker?

  • @alphgeek
    @alphgeek ปีที่แล้ว

    Are the policies analogous to a reward function in a neural network?

  • @SozImaScrub
    @SozImaScrub ปีที่แล้ว +1

    @MarkovBaj any thoughts?

  • @DanielkaElliott
    @DanielkaElliott ปีที่แล้ว

    Its like if you are already late just take the bus bit if you have time according to Google maps take the fastest route
    Otherwise take the simplest route you have time for. (With least changes and walking)

  • @brettbreet
    @brettbreet ปีที่แล้ว

    What's the watch model he's wearing?

  • @IanKjos
    @IanKjos ปีที่แล้ว +1

    There's no point in an edge going home from the railway station because having been at the railway station does not change the stochastic costs of the other options. Once you've decided the rail has the lowest stochastic cost, you're done. Now if we add a concept of traffic changing with time, then we have a higher-order model and the edge becomes pointful again.

  • @avinier325
    @avinier325 6 หลายเดือนก่อน

    Can anyone pls tell me where did he get his watch from.

  • @deep.space.12
    @deep.space.12 ปีที่แล้ว +1

    So... next video gonna be POMDP?

  • @jonr6680
    @jonr6680 ปีที่แล้ว +3

    Fascinating and useful overview.
    I've watched a few machine learning lectures, it intrigues me that the logic, theory, mechanics etc is (at this 101 level) identical to decision theory that any human should - could - would use to live their lives efficiently... But never do! Because we were never taught how.
    So I bet even the scientists who program their AI for some corporate exploitative system (probably), ironically waste their life taking dumb decisions every day...
    And the example given of commuting to work is the classic First World Problem... Like gamblers we all think we know how to game the system, but by playing it we have ALREADY LOST.
    Did I just invent computational philosophy??
    Per the reboot movie Tron - .

  • @danielg9275
    @danielg9275 ปีที่แล้ว

    Coo coo cachoo the probability depends on you!

  • @ohsweetmystery
    @ohsweetmystery ปีที่แล้ว +1

    The bike can also take longer than 60 minutes. Flat tires, catastrophic mechanical failure, getting hit by another vehicle, etc.

    • @scottcox503
      @scottcox503 ปีที่แล้ว

      True but it's much more within your control

  • @geniusdavid
    @geniusdavid ปีที่แล้ว

    Things to have as a computer scientist, a marker and paper. 😮

  • @chiboubamine5970
    @chiboubamine5970 หลายเดือนก่อน

    I have a problem called Facilities Layout Problem which I am trying to solve it using Reinforcement Learning. The initial state is a layout that has a cost and the goal is chnage the facilities layout in order to minmize the cost. My question do this problem should be treated episodically or continously and what to do in the case where there is no absorbing state?? I would be extremely happy if someone could help.

    • @ChristophTungersleben
      @ChristophTungersleben หลายเดือนก่อน

      If episodicaly or continously depends on the beginning state of the 'system' each action is a episode but it is possible to have the optimal by chance. Without break a loop might follow.

  • @bongsurfer
    @bongsurfer ปีที่แล้ว

    Thanks

  • @ENI232
    @ENI232 2 หลายเดือนก่อน

    More!

  • @terencewinters2154
    @terencewinters2154 3 หลายเดือนก่อน

    Do robots cue up ?

  • @marklonergan3898
    @marklonergan3898 ปีที่แล้ว +1

    I have to go to the bank and trust me I will be there in about the time of the year is starting to stir fry sauce instead of garlic on the way home now anyway I think I have a few things to do in the morning.
    There's predictive text models at work. Start with "I " and keep hammering the predicted word and see what comes out. 😁.

  • @Bill0102
    @Bill0102 3 หลายเดือนก่อน

    Remarkable work! This content is fantastic. I found something similar, and it was beyond words. "Game Theory and the Pursuit of Algorithmic Fairness" by Jack Frostwell

  • @odiseezall
    @odiseezall 11 หลายเดือนก่อน

    This is exactly what AI assistants should allow us to do - apply mathematical analysis to real world problems, in real-time.

  • @TheThunderSpirit
    @TheThunderSpirit ปีที่แล้ว

    im too doing reinforcement learning now

  • @RayCase
    @RayCase ปีที่แล้ว

    2022. Still using tractor feed printer paper as scrap.

  • @KibbleWhite
    @KibbleWhite ปีที่แล้ว +1

    This is great, except you got the percentages for traffic probability wrong. Light traffic is 10%, medium traffic is 20% and heavy traffic is 70% of the time.

  • @2k10clarky
    @2k10clarky 6 หลายเดือนก่อน

    You might also have a soft deadline for arriving to work so for example as long as your late only 1% of the time

  • @6DAMMK9
    @6DAMMK9 ปีที่แล้ว

    “How to guide AI to draw 5 fingers instead of forcing it”
    or use chopstick to eat noodles
    or bake a cake

  • @jasonmcfarlane7243
    @jasonmcfarlane7243 ปีที่แล้ว +2

    To all the people in the commentz-- No, he doesn't look 'wierd' or 'wrong', he has a lazy eye or similar condition. These conditions are common and normal. Shame on you.

  • @iwir3d
    @iwir3d ปีที่แล้ว

    Lets go skynet! ..... Lets go skynet! Long live the robot overlords.

  • @Jkauppa
    @Jkauppa ปีที่แล้ว +1

    make the difference/similarity of strict algorithm and fuzzy probabilistic selection algorithm clear

    • @Jkauppa
      @Jkauppa ปีที่แล้ว +1

      in the end the bayesian decision is the same as the strict algorithm, but implementation is wildly different and cleanness/interpretation of the algorithm can be clear/fuzzy (same problem, different paths, between step partial results, end result as logged)

    • @Jkauppa
      @Jkauppa ปีที่แล้ว +1

      fuzzy probabilistic ai vs dijkstra for shortest path

    • @Jkauppa
      @Jkauppa ปีที่แล้ว

      all algorithms give same kinds of answers for same problem but in different logical/math ways

    • @Jkauppa
      @Jkauppa ปีที่แล้ว

      describe dijkstra/A* in infinite memory probabilistic state algorithm

    • @Jkauppa
      @Jkauppa ปีที่แล้ว

      an algorithm might decide on fly while training if it remembers previous states or not

  • @Lion_McLionhead
    @Lion_McLionhead ปีที่แล้ว

    These shortest path algorithms convinced lions that whoever designs these algorithms is a lot smarter than a lion, spent an entire career designing just 1 algorithm, & it's pointless to try to remember them all.

  • @OwenPrescott
    @OwenPrescott ปีที่แล้ว

    It really bothers me that he's waving the pen around without the lid on

  • @Eagle3302PL
    @Eagle3302PL ปีที่แล้ว

    This video presents a problem, names a solution, doesn't present the named solution, then just ends. The whole video can be summed up as "in computer science sequential decisions with probable outcomes are made by using some approach, the approach requires some conditions to be determined for a desired outcome". IT NEVER SHOWS A SOLUTION, IT JUST SAYS THERE IS ONE. WHAT'S THE POINT?

  • @gollolocura
    @gollolocura ปีที่แล้ว

    Always take the bike

  • @michaelmueller9635
    @michaelmueller9635 ปีที่แล้ว

    My Sunday ...a chameleon is teaching me about robot decisions ...I'm trippin bro xD

  • @BritishBeachcomber
    @BritishBeachcomber ปีที่แล้ว

    *Self driving car.* Bike swerves in front. Action? 1. Brake hard, but can you stop in time?. 2. Swerve left, but what about that little kid? 3. Swerve right and hit incoming traffic, maybe killing many more people?
    Humans are very bad when faced with uncertainties like that. Machines would be no better.

  • @alexandrumacedon291
    @alexandrumacedon291 ปีที่แล้ว

    there are no decisions. there are choices. and all are random. if the parameters are obscure. just like us we are biological machines we know rules but we chose as we please.

  • @deanmarktaylor
    @deanmarktaylor ปีที่แล้ว

    I watched the film "The Mist" (2007) last night, it seems like "David" could have used a little "help" with this kind of decision making in the end.

  • @hurktang
    @hurktang ปีที่แล้ว

    No one understand how trains work in this video. The infographic makes the train jitter on his route and no one ever heard of train schedules.
    We should also factor cost. The risk of accident, the health benefit, the capacity to read your email in the train...

    • @Computerphile
      @Computerphile  ปีที่แล้ว

      The graphic illustrates that the route goes via somewhere else... (Unrealistic route for the timings but inspiration taken from my route from Nottingham to Oxford to meet Nick) HTH -Sean

    • @hurktang
      @hurktang ปีที่แล้ว

      ​@@Computerphile ​Ah sorry ! That make sens. You turned a 150 minutes train ride in a 30 minutes train ride and I found the ride quite bumpy. Thanks for wasting the time to reply to me

    • @Computerphile
      @Computerphile  ปีที่แล้ว

      You're welcome :0)

  • @veeek8
    @veeek8 ปีที่แล้ว

    So there is a scientific theory behind why i prefer cycling 😂

  • @buraktekgul2079
    @buraktekgul2079 10 หลายเดือนก่อน

    The paper's voice is so bad .please use white board for next videos

  • @liftingisfun2350
    @liftingisfun2350 ปีที่แล้ว

    What happened to him

  • @ShadowGameAlchemy
    @ShadowGameAlchemy ปีที่แล้ว +2

    I really love all your videos, but I cant stand the sound of marker pen against the paper. That kinda hiss sound irritates to my core. I might be the only one in the world, but my brain is programmed that way. Can you please remove that sound or use a different ball point or other pen ? I have to hold my earphones far when you start writing. Please consider this.

  • @D1ndo
    @D1ndo ปีที่แล้ว

    I was waiting for 17 minutes for him to actually solve the problem using the algorithm, yet he never got the the point, only babbled about the same thing over and over again. Big dislike.

  • @johnsenchak1428
    @johnsenchak1428 ปีที่แล้ว

    REPORTED NOT COMPUTER RELATED