Markov Decision Processes (MDPs) - Structuring a Reinforcement Learning Problem

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 พ.ค. 2024
  • 💡Enroll to gain access to the full course:
    deeplizard.com/course/rlcpailzrd
    Welcome back to this series on reinforcement learning! In this video, we'll discuss Markov decision processes, or MDPs. Markov decision processes give us a way to formalize sequential decision making. This formalization is the basis for structuring problems that are solved with reinforcement learning.
    We will detail the components that make up an MDP, including: the environment, the agent, the states of the environment, the actions the agent can take in the environment, and the rewards that may be given to the agent for its actions.
    Sources:
    Reinforcement Learning: An Introduction, Second Edition by Richard S. Sutton and Andrew G. Bartow
    incompleteideas.net/book/RLboo...
    Playing Atari with Deep Reinforcement Learning by Deep Mind Technologies
    www.cs.toronto.edu/~vmnih/doc...
    🕒🦎 VIDEO SECTIONS 🦎🕒
    00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources
    00:30 Help deeplizard add video timestamps - See example in the description
    06:04 Collective Intelligence and the DEEPLIZARD HIVEMIND
    💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥
    👋 Hey, we're Chris and Mandy, the creators of deeplizard!
    👉 Check out the website for more learning material:
    🔗 deeplizard.com
    💻 ENROLL TO GET DOWNLOAD ACCESS TO CODE FILES
    🔗 deeplizard.com/resources
    🧠 Support collective intelligence, join the deeplizard hivemind:
    🔗 deeplizard.com/hivemind
    🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order
    👉 Use your receipt from Neurohacker to get a discount on deeplizard courses
    🔗 neurohacker.com/shop?rfsn=648...
    👀 CHECK OUT OUR VLOG:
    🔗 / deeplizardvlog
    ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind:
    Tammy
    Mano Prime
    Ling Li
    🚀 Boost collective intelligence by sharing this video on social media!
    👀 Follow deeplizard:
    Our vlog: / deeplizardvlog
    Facebook: / deeplizard
    Instagram: / deeplizard
    Twitter: / deeplizard
    Patreon: / deeplizard
    TH-cam: / deeplizard
    🎓 Deep Learning with deeplizard:
    Deep Learning Dictionary - deeplizard.com/course/ddcpailzrd
    Deep Learning Fundamentals - deeplizard.com/course/dlcpailzrd
    Learn TensorFlow - deeplizard.com/course/tfcpailzrd
    Learn PyTorch - deeplizard.com/course/ptcpailzrd
    Natural Language Processing - deeplizard.com/course/txtcpai...
    Reinforcement Learning - deeplizard.com/course/rlcpailzrd
    Generative Adversarial Networks - deeplizard.com/course/gacpailzrd
    🎓 Other Courses:
    DL Fundamentals Classic - deeplizard.com/learn/video/gZ...
    Deep Learning Deployment - deeplizard.com/learn/video/SI...
    Data Science - deeplizard.com/learn/video/d1...
    Trading - deeplizard.com/learn/video/Zp...
    🛒 Check out products deeplizard recommends on Amazon:
    🔗 amazon.com/shop/deeplizard
    🎵 deeplizard uses music by Kevin MacLeod
    🔗 / @incompetech_kmac
    ❤️ Please use the knowledge gained from deeplizard content for good, not evil.

ความคิดเห็น • 97

  • @deeplizard
    @deeplizard  5 ปีที่แล้ว +20

    Check out the corresponding blog and other resources for this video at:
    deeplizard.com/learn/video/my207WNoeyA

  • @beltusnkwawir2908
    @beltusnkwawir2908 2 ปีที่แล้ว +18

    Can we take a second and just appreciate the work put in producing such high-quality videos in bites that are easy to understand?

  • @mike13891
    @mike13891 2 ปีที่แล้ว +5

    I’m so glad you produced this series of videos. I was intimidated by all the math and algorithm variations covered in the first four lectures of my graduate course. After watching these videos and then revisiting my grad lectures, I now actually understand what my professor was trying to teach. Thank you!

  • @amirhosseinesteghamat7621
    @amirhosseinesteghamat7621 3 ปีที่แล้ว +1

    I saw different channels but no one explained this topic better than you . thanks alot

  • @aparvkishnov4595
    @aparvkishnov4595 3 ปีที่แล้ว +4

    Thanks deeplizard for doing the hard work on illustrations to explain it to the feeble-minded. Its like training a donkey, how to solve calculus.

  • @thusharadunumalage709
    @thusharadunumalage709 4 ปีที่แล้ว +1

    Great tutorial, understood the concept clearly for the first time, after going through many. Thank you very much.

  • @jscf92
    @jscf92 5 ปีที่แล้ว +4

    This series is awesome. Make learning a lot easier. Thank you so much.

  • @sahand5277
    @sahand5277 5 ปีที่แล้ว +4

    Keep up the good work, thank you for the time your are putting on making this series :)

  • @alokk7347
    @alokk7347 3 ปีที่แล้ว +3

    I was wandering here and there looks like I have landed a perfect place to learn Deep Learning.... Thanks .. I will continue.

  • @haneulkim4902
    @haneulkim4902 3 ปีที่แล้ว +1

    Seriously... Amazing tutorial! I really like how you offer text version as well. Thanks you :)

  • @danielzoulla3898
    @danielzoulla3898 3 ปีที่แล้ว +1

    amazing explanation of what is RL. I will watch the whole series from now

  • @ilovemusic465
    @ilovemusic465 5 ปีที่แล้ว +3

    Very intuitive and easy explanation. Thank you! 🤗😀

  • @patrick.t1978
    @patrick.t1978 4 ปีที่แล้ว

    Thanks a lot, your explanation's very clear and detailed.

  • @SandwichMitGurke
    @SandwichMitGurke 5 ปีที่แล้ว +34

    this is by far the best tutorial I've seen about this topic. I'm about to watch the whole series :D

    • @deeplizard
      @deeplizard  5 ปีที่แล้ว +3

      Whoop! Thank you :)
      More videos will continued to be added to this series as well!

    • @cuteruby7392
      @cuteruby7392 4 ปีที่แล้ว

      subscribed!

  • @Galinator9000
    @Galinator9000 2 ปีที่แล้ว +1

    Great video with intuitive explanations 👌

  • @MrJoeDone
    @MrJoeDone ปีที่แล้ว

    There really should be more videos in this style. I hope there will be a lot more videos on this channel usefull to me

  • @amadlover
    @amadlover 5 ปีที่แล้ว +2

    More power to you @Deeplizard

  • @muomgu
    @muomgu 4 ปีที่แล้ว +3

    You are awesome.
    This series would help me for my project.
    Thank you so much.
    Best regards...

  • @SharmaScribe
    @SharmaScribe 5 ปีที่แล้ว +6

    Thanks for this content good going.

  • @sahanakaweerarathna9398
    @sahanakaweerarathna9398 5 ปีที่แล้ว +1

    Best youtube channel to learn ML

  • @asdfasdfuhf
    @asdfasdfuhf 3 ปีที่แล้ว +1

    Second video completed, the video was clear as day

  • @alexusnag
    @alexusnag 4 ปีที่แล้ว +1

    Really friendly beginning.

  • @christopherherrera5015
    @christopherherrera5015 3 ปีที่แล้ว +1

    Thank you so much it is very clear the explanation of MDPs.

  • @rapisode1
    @rapisode1 3 ปีที่แล้ว +1

    You guys rock! Thanks so much!

  • @jeffreyredondo
    @jeffreyredondo 2 ปีที่แล้ว +1

    well explained and easy to listen.

  • @harshadevapriyankarabandar5456
    @harshadevapriyankarabandar5456 5 ปีที่แล้ว +2

    very very very very help full..thnks for making these videos..pls keep it going

  • @avishekhbt
    @avishekhbt 5 ปีที่แล้ว +2

    Awesome!! Thanks! :)

  • @ns3lover779
    @ns3lover779 5 ปีที่แล้ว +2

    awsome thank you .

  • @nossonweissman
    @nossonweissman 2 ปีที่แล้ว +1

    This video can be denoted by n as n approaches perfection.

  • @mateusbalotin7247
    @mateusbalotin7247 2 ปีที่แล้ว

    Thank you!

  • @dallasdominguez2224
    @dallasdominguez2224 ปีที่แล้ว +1

    Great video

  • @thatipelli1
    @thatipelli1 4 ปีที่แล้ว

    Excellent explanation. It will be great if you could make a video series on all Math concepts behind Machine learning.

    • @deeplizard
      @deeplizard  4 ปีที่แล้ว

      Thanks, Anirudh. If you haven't checked out our Deep Learning Fundamentals course, I'd recommend it, as it has some of the major math concepts fully detailed there.

  • @deepakkumarmeena1890
    @deepakkumarmeena1890 5 ปีที่แล้ว +5

    Appreciate the cute example

  • @theliterunner
    @theliterunner หลายเดือนก่อน +1

    - **Introduction to Markov Decision Processes (MDPs)**:
    - 0:00 - 0:17
    - **Components of MDPs**:
    - 0:23 - 1:43
    - **Mathematical Representation of MDPs**:
    - 1:47 - 3:59
    - **Probability Distributions and Transition Probabilities**:
    - 4:02 - 4:56
    - **Conclusion and Next Steps**:
    - 5:01 - 5:47

  • @nesrienali7120
    @nesrienali7120 ปีที่แล้ว

    This is the best lecture in RL, Thank you..
    Can I get the presentaion please

  • @atmadeeparya2454
    @atmadeeparya2454 3 ปีที่แล้ว +2

    Hi, This is extremely intuitive and super easy to understand. I was wondering if you could tell me what resources you used to learning this material? How do you learn material like this (your best practices) and how much time it took you to learn the material (for making deeplizard content)? Thanks a lot for making this content and waiting for your reply.

    • @deeplizard
      @deeplizard  3 ปีที่แล้ว +4

      As formal resources, I used the book “Reinforcement Learning: An Introduction” Second edition by Richard Sutton and Andrew Barto, along with this DeepMind paper:
      www.cs.toronto.edu/~vmnih/docs/dqn.pdf
      I also used various informal resources, like reading many blog articles, forums, etc.

  • @benvelloor
    @benvelloor 4 ปีที่แล้ว +1

    Thank youu.

  • @3maim
    @3maim 5 ปีที่แล้ว +2

    Will you cover Q-learning in this series? I really like your tutorials, very well explained!

    • @deeplizard
      @deeplizard  5 ปีที่แล้ว

      Hey Marius - Yes, Q-learning will be covered! Check out the syllabus video to see the full details for everything we'll be covering: th-cam.com/video/nyjbcRQ-uQ8/w-d-xo.html

    • @3maim
      @3maim 5 ปีที่แล้ว

      Super, thanks!

  • @faqeerhasnain
    @faqeerhasnain 9 หลายเดือนก่อน

    The agent is not part of the MDP itself but rather interacts with it. The agent's role is to select actions based on the current state and the policy it's following, and it receives feedback in the form of rewards and new state observations from the environment, which is modeled as an MDP.

  • @thinhdao7023
    @thinhdao7023 3 ปีที่แล้ว

    I am reading a paper of applying Q-learning in repeated Cournot Oligopoly game in Economics where firms are agents who choose their level of production to gain profit. I can understand in that environment actions are the possible level of output that firm choose to produce. However, it is unclear for me what the states are in this situation. Could you please provide a further explanation in this case?

  • @elshroomness
    @elshroomness ปีที่แล้ว +1

    OMG its clicking. ITs actually clicking in my head!!!

  • @rooneymara8061
    @rooneymara8061 4 ปีที่แล้ว +3

    {
    "question": "What does MDP stand for?",
    "choices": [
    "Markov Delicate Programs",
    "Modern Dealing Processes",
    "Markov Decision Processes",
    "Modern Derivative Parallels"
    ],
    "answer": "Markov Delicate Programs",
    "creator": "RooneyMara",
    "creationDate": "2019-10-20T06:28:56.399Z"
    }

    • @deeplizard
      @deeplizard  4 ปีที่แล้ว +1

      Thank you, Rooney! First quiz question for this video :D
      I believe you mistakenly chose the wrong answer, so I corrected it and just pushed it to the site. Take a look :)
      deeplizard.com/learn/video/my207WNoeyA

  • @santoshkumarganji1801
    @santoshkumarganji1801 3 ปีที่แล้ว

    Could you pl provide any notes/PPT related to MDP process.

  • @qusayhamad7243
    @qusayhamad7243 2 ปีที่แล้ว +1

    thanks

  • @animystic5970
    @animystic5970 3 ปีที่แล้ว

    Hi! Loved the video and I think I have a solid understanding of the MDP. But I'm having trouble making sense of the equation. Why is the LHS a probability and the RHS a set? And what does Pr stand for?

    • @deeplizard
      @deeplizard  3 ปีที่แล้ว

      Thanks! Pr stands for "probability", so the RHS is a probability as well.

    • @animystic5970
      @animystic5970 3 ปีที่แล้ว

      @@deeplizard Oh now I see . It's an expansion of the same thing! Thanks for clarifying!

  • @alevilghost
    @alevilghost 2 ปีที่แล้ว

    Gracias por los subtítulos en Castellano. 🤗

  • @chyldstudios
    @chyldstudios 5 ปีที่แล้ว

    Will you be using OpenAI Gym to demonstrate reinforcement learning concepts?

    • @deeplizard
      @deeplizard  5 ปีที่แล้ว +2

      Hey Chyld - Yes, we'll be utilizing OpenAI Gym once we get into coding! Check out the syllabus video to see the full details for everything we'll be covering: th-cam.com/video/nyjbcRQ-uQ8/w-d-xo.html

  • @ushnishsarkar7000
    @ushnishsarkar7000 4 ปีที่แล้ว +1

    {
    "question": "State and Reward at time t depends ",
    "choices": [
    "State Action pair for time (t-1)",
    "Cumulative reward at time t ",
    "Agent Dynamics",
    "State Action pair for all time instances before t"
    ],
    "answer": "State Action pair for time (t-1)",
    "creator": "Ushnish Sarkar",
    "creationDate": "2020-06-01T16:24:16.894Z"
    }

    • @deeplizard
      @deeplizard  4 ปีที่แล้ว

      Thanks, ushnish! Just added your question to deeplizard.com/learn/video/my207WNoeyA :)

  • @louerleseigneur4532
    @louerleseigneur4532 4 ปีที่แล้ว

    merci

  • @actionchaplin149
    @actionchaplin149 3 ปีที่แล้ว +1

    Hey thanks for awesome videos. This is maybe a stupid question, but what's the difference between s and s' ?

    • @deeplizard
      @deeplizard  3 ปีที่แล้ว +1

      s' is the symbol we use in this episode to denote the next state that occurs after state s.

  • @adamhendry945
    @adamhendry945 3 ปีที่แล้ว +36

    Please give credit to "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto, copyright 2014, 2015. You allow viewers to pay you through Join and this book material is copyrighted, but you do not reference them anywhere on your website. The equations and material are pulled directly from the text and it presents an ethical issue. Though the book is open-sourced, it is copyrighted, and you are using this material for financial gain. This text book has been used in several university courses on reinforcement learning in the past.
    I love these videos, but proper credit and securing approval from the authors must be obtained!

    • @yannisran7312
      @yannisran7312 3 ปีที่แล้ว +1

      Could math equation itself be copyrighted?

    • @rajathhalgi3592
      @rajathhalgi3592 2 ปีที่แล้ว

      Totally agree

  • @nossonweissman
    @nossonweissman 2 ปีที่แล้ว +1

    {
    "question": "If a math student is the agent, then the _______________ is the environment.",
    "choices": [
    "math quiz",
    "math professor",
    "quiz score",
    "Swiss mathematician Leonhard Euler"
    ],
    "answer": "math quiz",
    "creator": "N Weissman",
    "creationDate": "2022-03-21T22:50:05.763Z"
    }

    • @deeplizard
      @deeplizard  2 ปีที่แล้ว +1

      Thanks for the great quiz question!

  • @tingnews7273
    @tingnews7273 5 ปีที่แล้ว +3

    What I learned:
    1、MDP is formalize decision making process. (Yeah, everybody teach the MDP at first ,no body tell me why until now . Its a strange world)
    2、The R(t+1) is because of At , before I always think ,Rt is pair with At
    3、The agent is care about accumulate reward ( For others dont know )

  • @thak456
    @thak456 4 ปีที่แล้ว

    When are you restarting ?

  • @christianliz-fonts3524
    @christianliz-fonts3524 4 ปีที่แล้ว

    Where is the discord link?

  • @prathampandey9898
    @prathampandey9898 2 ปีที่แล้ว

    What is the difference between s and s' (s prime)?

    • @deeplizard
      @deeplizard  2 ปีที่แล้ว

      s' is the derivative of s

  • @ashabrar2435
    @ashabrar2435 3 ปีที่แล้ว +1

    {
    "question": "In MDP which component role is to maximize the total Reward R ",
    "choices": [
    "Agent",
    "State",
    "Action",
    "Reward"
    ],
    "answer": "Agent",
    "creator": "Hivemind",
    "creationDate": "2020-12-27T00:22:07.005Z"
    }

    • @deeplizard
      @deeplizard  3 ปีที่แล้ว

      Thanks, ash! Just added your question to deeplizard.com/learn/video/my207WNoeyA :)

  • @ArpitDhamija
    @ArpitDhamija 3 ปีที่แล้ว

    Its more like a podcast, took me 20x more time to write down everything you said from the captions😵

  • @dukedaffy5457
    @dukedaffy5457 3 ปีที่แล้ว +2

    {
    "question": "Which is the correct order for the components of MDP?",
    "choices": [
    "Agent--->Environment--->State--->Action--->Reward",
    "Environment--->Agent--->State--->Action--->Reward",
    "State--->Agent--->Environment--->Action--->Reward",
    "Agent--->State--->Environment--->Action--->Reward"
    ],
    "answer": "Agent--->Environment--->State--->Action--->Reward",
    "creator": "Duke Daffin",
    "creationDate": "2021-01-16T12:19:28.304Z"
    }

    • @deeplizard
      @deeplizard  3 ปีที่แล้ว +1

      Thanks, Duke! Just added your question to deeplizard.com/learn/video/my207WNoeyA :)

  • @designwithpicmaker2785
    @designwithpicmaker2785 5 ปีที่แล้ว

    when next videos coming? any scheduling

    • @deeplizard
      @deeplizard  5 ปีที่แล้ว +1

      Hey navaneetha - Currently aiming to release a new video in this RL series at least every 3-4 days.

  • @aaronbaron6468
    @aaronbaron6468 3 ปีที่แล้ว

    i came here to learn about a topic and left sad that OG.JeRax and OG.ana is'nt on the active roster, hopefully OG.Sumail will carry as well as ana did.

  • @SharmaScribe
    @SharmaScribe 5 ปีที่แล้ว

    Could we please get the code files for free only for students.??

    • @deeplizard
      @deeplizard  5 ปีที่แล้ว +3

      Hey Mayank - We currently don't have any systems in place to implement or track a setup like that. Just for clarity, note that all of the code will be fully shown in the videos, so the code itself is freely available. Also, the corresponding blogs for each video are freely available at deeplizard.com.
      The convenience of downloading the pre-written organized code files is what is available as a reward for members of the deeplizard hivemind.
      deeplizard.com/hivemind

  • @an_omega_wolf
    @an_omega_wolf 5 ปีที่แล้ว +3

    Dota

  • @TheD2D21
    @TheD2D21 5 ปีที่แล้ว

    Are you sure this is Markov? I think you're thinking Pablov. I'm looking for Markovian on/off states.

    • @deeplizard
      @deeplizard  5 ปีที่แล้ว +1

      Yes, this is the topic of Markov Decision Processes.

    • @TheD2D21
      @TheD2D21 5 ปีที่แล้ว +1

      @@deeplizard Thanks

  • @papermaker107
    @papermaker107 ปีที่แล้ว

    "we're gonna represent an MDP with mathematical notation, this will make things easier"
    🧢

  • @keshavsairam3615
    @keshavsairam3615 2 ปีที่แล้ว +2

    came to learn,but uh oh i saw dota

  • @DavoodWadi
    @DavoodWadi 3 ปีที่แล้ว +1

    “S sub t gives us A sub t...”
    Reading off text? Nice text-to-speech tutorial.

    • @carostrickland4146
      @carostrickland4146 3 ปีที่แล้ว

      What else was she supposed to say? Learning with text vs spoken word is the same thing, I don't see a better alternative.

  • @ItachiUchiha-fo9zg
    @ItachiUchiha-fo9zg 2 ปีที่แล้ว

    Markovs chain: th-cam.com/video/rHdX3ANxofs/w-d-xo.html

  • @Petya224
    @Petya224 หลายเดือนก่อน

    Nice explanation, i can implement this now without diving into math a lot, not the best elegant way though but anyway, concept understood

  • @ziaurrehman8247
    @ziaurrehman8247 2 ปีที่แล้ว +2

    This series is awesome. Make learning a lot easier. Thank you so much.

  • @MrRynRules
    @MrRynRules 3 ปีที่แล้ว +1

    Thank you!

  • @carlosromero-sn9nm
    @carlosromero-sn9nm 5 ปีที่แล้ว +1

    Great video