Lecture 17 - MDPs & Value/Policy Iteration | Stanford CS229: Machine Learning Andrew Ng (Autumn2018)

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ธ.ค. 2024

ความคิดเห็น • 16

  • @myao8930
    @myao8930 2 ปีที่แล้ว +17

    This is the best instruction among all videos on reinforcement learning. Thank you!

  • @supersnowva6717
    @supersnowva6717 9 หลายเดือนก่อน +3

    Such a great lecture on RL! Super clear on these algorithms, thanks so much Profession Ng!

  • @ali57555
    @ali57555 ปีที่แล้ว +2

    Thank you very much for explaining this in such simple terms! Been looking for some time for something good to understand MDPs

  • @KipIngram
    @KipIngram 2 ปีที่แล้ว +2

    1:13:19 - Probably the right model here is the one we used to spread across the planet. Most folks are trying to get by best they can, and are likely to pursue "exploitative" strategies - what they know will bring them what they want. But sometimes we explicitly launch exploration missions, and the "success criterion" of such a mission is very different from that of a "profit oriented initiative." The "payoff" for an exploration mission is *knowledge*. I think keeping the two things cleanly separate is probably the way to go.

  • @Ayanshandseals
    @Ayanshandseals 2 ปีที่แล้ว +2

    indeed (1-epsilon) Greedy is the correct term and should have been used!

  • @gokdeniztingur7515
    @gokdeniztingur7515 9 หลายเดือนก่อน

    great video man!

  • @henkjekel4081
    @henkjekel4081 2 ปีที่แล้ว +1

    Thank you andrew, u the best

  • @genotabby
    @genotabby 8 หลายเดือนก่อน

    48:42 this should be for stochastic methods right? If it is deterministic then the value policy V(S) should be calculated based on the 100% chance of the direction in the optimal policy. For stochastic it would be, in this case, 0.8 in the direction of the optimal policy, 0.1 chance for left side of the optimal policy, 0.1 chance for the right side. Since the left side is already at the border, it would return back to it's original state hence 0.1*0.71

  • @KipIngram
    @KipIngram 2 ปีที่แล้ว +2

    1:12:00 - I feel exactly the same mixed feelings that Dr. Ng seems to feel here. On the one hand, this technology is amazing, and there are so many wonderful things we can do with it, such as helping people get better medical care more quickly, and so on. These things could save lives. But there are also so many nasty things we can do with them; this general category of stuff is part of how we're.. sterilizing the world, so to speak. Removing the "humanity" from things and making our culture colder, more clinical, and less empathic and compassionate. I honestly don't know how to walk that tightrope - in cases like this "if we don't do it, someone else will." I suppose the best we can do is just try every day to keep some sort of "human-ness" in our endeavors. Some of us will do a pretty good job of that - some of us won't. 😞

  • @griffinbholt
    @griffinbholt 2 ปีที่แล้ว +2

    Is there one student in the class with just a crazy deep voice? Or are they masking students' voices?

    • @griffinbholt
      @griffinbholt 2 ปีที่แล้ว +1

      Nvm. I can confirm they are masking students' voices. One time they accidentally masked Dr. Ng's voice.

    • @PhucHoang-ng4vh
      @PhucHoang-ng4vh ปีที่แล้ว

      @@griffinbholt u can see in another video, they would blurred it whenever a student appeared on screen

  • @andrewpan1700
    @andrewpan1700 ปีที่แล้ว +4

    robot at 1:08:10 looking a little stiff

  • @McAwesomeReaper
    @McAwesomeReaper ปีที่แล้ว +1

    The real genius on display here is the guy who invented the silicon based ink of the markers.

  • @SuzanneWolfe-zc9bt
    @SuzanneWolfe-zc9bt 3 หลายเดือนก่อน

    Thompson Margaret Rodriguez Susan Allen Nancy