Bellman Equation Basics for Reinforcement Learning

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ค. 2024
  • An introduction to the Bellman Equations for Reinforcement Learning.
    Part of the free Move 37 Reinforcement Learning course at The School of AI.
    www.theschool.ai/courses/move...

ความคิดเห็น • 86

  • @DCG_42
    @DCG_42 4 ปีที่แล้ว +29

    Look at many people trying to explain this. even Berkeley lectures, this dude learned on his own and he explained soooo much better, thank you so much!!

  • @zishiwu7757
    @zishiwu7757 3 ปีที่แล้ว +2

    Great advice to thoroughly understand the vocabulary of a subject of interest first. It's similar to gathering information from users and figuring out the requirements of what a computer program should do before coding it.

  • @Nana-wu6fb
    @Nana-wu6fb 3 ปีที่แล้ว +1

    Thank you so much for all the videos you've done for this topic!!! Appreciate it so much with the details and also walk-through of codes and also the tips in terms of how to study! Really helpful

  • @vizart2045
    @vizart2045 2 ปีที่แล้ว

    I am working on machine learning and this was new to me. Thanks for bringing it to my attention.

  • @0i0l0o
    @0i0l0o 4 ปีที่แล้ว +2

    your approach is amazing. thank You, good Sir.

  • @mykyta.petrenko
    @mykyta.petrenko 4 ปีที่แล้ว

    The way you explain is great. It is very clear for me despite the fact that my English is not so good.

  • @craigowsen4501
    @craigowsen4501 3 ปีที่แล้ว +6

    I just finished the 5th lesson in this series. It is awesome! I am also reading Artificial Intelligence Engines: A Tutorial Introduction to the Mathematics of Deep Learning. The math just comes to life after seeing Colin's Python programs. Incredible series! Thanks

    • @salehabod6740
      @salehabod6740 3 ปีที่แล้ว

      Where can we find the codes?

  • @tianyuzhang7404
    @tianyuzhang7404 ปีที่แล้ว

    The first time I a have a feeling of what Bellman’s equation is for. Awesome video.

  • @iworeushankaonce
    @iworeushankaonce 4 ปีที่แล้ว +3

    Wow, amazingly done! Have searched over the entire internet to finally find this video and understand the idea behind this equation

    • @vaishnavi-jk9ve
      @vaishnavi-jk9ve 3 ปีที่แล้ว +1

      🎈

    • @vaishnavi-jk9ve
      @vaishnavi-jk9ve 3 ปีที่แล้ว +1

      🥰

    • @estebannicolas78
      @estebannicolas78 2 ปีที่แล้ว +1

      I dont mean to be so off topic but does someone know a way to get back into an Instagram account??
      I somehow forgot my account password. I would appreciate any tips you can give me.

  • @saikiranvarma6450
    @saikiranvarma6450 2 ปีที่แล้ว

    A very positive start of the video thanks you, keep going and we keep supporting

  • @user-wd3gm5nv7u
    @user-wd3gm5nv7u 2 ปีที่แล้ว

    You are a great teacher man! Mega Thanks...

  • @mishrag104
    @mishrag104 3 ปีที่แล้ว

    Topic should be "Bellman Reinforcement Learning for dummies". Great job in explaining with steps.

  • @wishIKnewHowToLove
    @wishIKnewHowToLove ปีที่แล้ว

    This guy took this complicated formula and made it easy

  • @akashpb4044
    @akashpb4044 2 ปีที่แล้ว +1

    Brilliantly explained 👍🏼👍🏼

  • @AlessandroOrlandi83
    @AlessandroOrlandi83 4 ปีที่แล้ว +11

    I'm here from coursera, I'm having a bit of a trouble understanding them, thanks for the video!

    • @vamshidawat5654
      @vamshidawat5654 3 ปีที่แล้ว +1

      me also

    • @psinha6502
      @psinha6502 3 ปีที่แล้ว

      can u tell which course?

    • @vamshidawat5654
      @vamshidawat5654 3 ปีที่แล้ว +1

      @@psinha6502 pratical reinforcement learning course from national research university higher school of economics

  • @mrknarf4438
    @mrknarf4438 4 ปีที่แล้ว

    This is great! And fun!!! Thank you!!

  • @anilkurkcu3389
    @anilkurkcu3389 5 ปีที่แล้ว +6

    Thanks for the "Quick Study Tips"!

  • @somecsmajor
    @somecsmajor ปีที่แล้ว

    Thanks Skowster, you're a real one!

  • @yadugna
    @yadugna 6 หลายเดือนก่อน

    Great presentation- thank you

  • @maggielin8664
    @maggielin8664 4 ปีที่แล้ว

    Thank you so much, Sir.

  • @afmjoaa
    @afmjoaa 9 หลายเดือนก่อน +2

    Awesome explanation.

  • @patite3103
    @patite3103 3 ปีที่แล้ว +2

    Great video! Could you explain why you move to the left when calculating the values and not for example go down where V = 0.9. How would you calculate the value V = 0.9 left to the cell with reward -1?

    • @omkarkulkarni2595
      @omkarkulkarni2595 2 ปีที่แล้ว

      as far I have understood, he used the value in 1st row, 3rd column, he did not use the value of the -1

  • @newan0000
    @newan0000 ปีที่แล้ว

    So clear, thank you!

  • @pggg5001
    @pggg5001 2 ปีที่แล้ว +4

    10:45. I think you might have some error here.
    The equation is V(s) = max(R(s,a) + r*V(s')) i.e. V(s) max of the sum of the reward of the CURRENT cell (which is s) plus the V-value of its best NEIGHBOUR (which is s').
    so the V-value of the cell left of the princess should be V = 0 + 0.9 * 1 = 0.9, not 1, right?

    • @virgenalosveinte5915
      @virgenalosveinte5915 8 หลายเดือนก่อน

      I thought so too at first, but then realized that R(s,a) is a function of s and a, the current state and action pair, not s' which is the next state. So the reward is given by taking the action, not being in the next state. Its a detail anyway but ill leave it for future people

  • @irfanwar6986
    @irfanwar6986 3 ปีที่แล้ว

    I appreciate your work

  • @katakeresztesi
    @katakeresztesi 3 ปีที่แล้ว +1

    Is the Move 37 course currently available? i am having a hard time finding it

  • @thenoteBooktalkshow
    @thenoteBooktalkshow 3 ปีที่แล้ว

    Awsm explanation sir

  • @volt897
    @volt897 3 ปีที่แล้ว

    i have a question, how can i decide the actions to take before going backwards to extimate values?

  • @sambo7734
    @sambo7734 4 ปีที่แล้ว +9

    isn't V==1 in the princess square (reward == 1 and no next state) and the one to the left == 0.9 (reward == 0 + gamma * 1) ?

    • @danielketterer9683
      @danielketterer9683 3 ปีที่แล้ว +1

      Yup.

    • @ManjeetKaur-hm4jt
      @ManjeetKaur-hm4jt 3 ปีที่แล้ว +3

      The value of the terminal state is always 0, not 1.

    • @ravynala
      @ravynala 2 ปีที่แล้ว

      The value is calculated for rewards resulting from an action being taken. We are looking at the value of good decisions, not good initial placement.

    • @pggg5001
      @pggg5001 2 ปีที่แล้ว

      it should be 0.9. R(s,a) and V(s) should be evaluated at the same s.

  • @Anon46d246
    @Anon46d246 5 ปีที่แล้ว +1

    It really helps me.

  • @ScottTaylorMCPD
    @ScottTaylorMCPD 5 หลายเดือนก่อน

    The link to the "free Move 37 Reinforcement Learning course" mentioned in the description appears to be dead.

  • @pramodpal3320
    @pramodpal3320 4 ปีที่แล้ว

    Is it like, because we are going from that state with particular action which will lead us to final reward of +1, we are taking the reward value of R(s,a) as 1.

  • @kabbasoldji3816
    @kabbasoldji3816 ปีที่แล้ว

    thank you very much sir 😍

  • @AmanSharma-cv3rn
    @AmanSharma-cv3rn 2 ปีที่แล้ว

    Simple and clear❤️

  • @nickquinn9416
    @nickquinn9416 4 ปีที่แล้ว

    Awesome video!

  • @rikiriki43
    @rikiriki43 ปีที่แล้ว

    Thank you for this

  • @jeffreyanderson5333
    @jeffreyanderson5333 3 ปีที่แล้ว

    Thanks that saved me from the day

  • @ahmet9446
    @ahmet9446 3 ปีที่แล้ว

    I have a question about the step before going to the fire. R(s, a) of fire is -1. The step before, shouldn't be the V value -1 + 0.9 = -0.1?

  • @user-hn8nm3ei7u
    @user-hn8nm3ei7u 2 ปีที่แล้ว

    very vivid. So the optimal step (action) to take is the direction to increase value the most (gradient direction).

  • @erdoganyildiz617
    @erdoganyildiz617 4 ปีที่แล้ว +3

    I have some difficulties understanding something. I would be glad if anyone could help.
    Bellman equation basically states that to calculate the value of a state, we need to check [R(s,a) + gama*V(s')] for all possible actions in future and select the maximum one. In the Mario example we've directly placed the value=1 to the box near the princess. Because as a human we know the best possible action in this state is to take step towards the princess. But the only way for a computer to decide the best possible action is the try all possible actions and compare the results. Here is where the problem starts, there will be infinite loops (let's say in one infinite loop case, Mario could go back and forth forever) and the computer never will get the chance to compare results because it won't have all the results to make a comparison.
    What am I missing here? Thanks in advance.

    • @roboticcharizard
      @roboticcharizard 3 ปีที่แล้ว +2

      In this example, we're starting from the base case, that is the final case when we are about to reach the princess. Since we are starting from the base case, we always have some value assigned to the cell, and we can just work our way back to the initial state. You're right in the sense that if we were to start from the initial state and explore from there, we could run into an infinite loop, but in the given example, we are starting from the base case.

  • @rodolfojoseleopoldofarinar7317
    @rodolfojoseleopoldofarinar7317 ปีที่แล้ว

    thumbs up for the quake reference :)

  • @japneetsingh5015
    @japneetsingh5015 5 ปีที่แล้ว +1

    really enjoyed

  • @mohmmedshafeer2820
    @mohmmedshafeer2820 หลายเดือนก่อน

    Thanks You Skowster

  • @mohammed333suliman
    @mohammed333suliman 5 ปีที่แล้ว

    Helpful thank you

  • @chiedozieonyearugbulem9363
    @chiedozieonyearugbulem9363 ปีที่แล้ว

    Thank you for this concise video. The texts provided by my lecturer wasn't this easy to understand

  • @malanb5
    @malanb5 4 ปีที่แล้ว

    The example looks very familiar to the Georgia Tech's Reinforcement Learning course's example

  • @yolomein415
    @yolomein415 4 ปีที่แล้ว

    Where do you get that reward function? The R(s, a)?

    • @pengli4769
      @pengli4769 3 ปีที่แล้ว

      I would say if it is a chess game. In a state, when your agent places a piece somewhere, and your opponent loose a piece, your agent get a positive reward. State is like current chessboard, action is where you put thr piece, reward can be defined by you.

  • @patf9770
    @patf9770 5 ปีที่แล้ว

    This series is amazing.

  • @brodderick
    @brodderick 5 ปีที่แล้ว +2

    brilliant

  • @pramodpal3320
    @pramodpal3320 4 ปีที่แล้ว

    I think for reward you are talking about current state.
    So how could it be +1 when you are starting backward?

  • @KennTollens
    @KennTollens 3 ปีที่แล้ว +1

    I just started machine learning a couple days ago and it looks like the Bellman equation is part of the Q-learning. I would love if someone did a manual math of how to calculate the q-learning as the agent moves through the network.
    s - state is the square the agent is in
    s' - is the next square the agent will go to
    V - Value
    R(s,a) - reward for the state given an action
    y - penalty for going into the new state between 0 and 1

    • @fazaljarral2792
      @fazaljarral2792 3 ปีที่แล้ว

      kenn just started today , is there any discord server for beginners?

  • @gogigaga1677
    @gogigaga1677 2 ปีที่แล้ว

    BEST EXPLANATION

  • @bea59kaiwalyakhairnar37
    @bea59kaiwalyakhairnar37 ปีที่แล้ว

    That means each action of the agent will give him a reward and if he finds the path or get to the goal then he will get maximum reward. But does an agent trys to find different ways to reach goals.
    Also i subscribed.

  • @shakyasarkar7143
    @shakyasarkar7143 4 ปีที่แล้ว

    How is the 3rd row, 4th column value coming as 0.73? Can anyone please explain? Please

    • @Shriukan1
      @Shriukan1 3 ปีที่แล้ว +1

      Because the next possible moves are either -1 or 0,81.
      So the best move is to go to 0,81, multiplied by the gamma factor of 0,9.
      .81 x .9 is .73 :)

  • @faisalamir1656
    @faisalamir1656 2 ปีที่แล้ว

    thanks alot mannnnnnn

  • @ORagnar
    @ORagnar 2 ปีที่แล้ว

    3:44 It's fascinating to note that in 1954 there were no digital computers. I wonder if they were using analog computers.

  • @alexusnag
    @alexusnag 5 ปีที่แล้ว

    Great tutorial!

  • @ejbock5b179
    @ejbock5b179 2 หลายเดือนก่อน

    What happens if instead of mario being drunk we had that Mario can move how instructed but the identity of the two rooms trap and success are unknown, but information is known of the probability that either is trap or success. i.e. mario knows that the success room may be trap with 50 % confidence

  • @chenguangzhou3991
    @chenguangzhou3991 3 ปีที่แล้ว

    super

  • @pramodpal3320
    @pramodpal3320 4 ปีที่แล้ว

    I mean for the second last cell

  • @phattaraphatchaiamornvate8827
    @phattaraphatchaiamornvate8827 2 ปีที่แล้ว

    you make me god ty.

  • @tonymornoty6346
    @tonymornoty6346 3 ปีที่แล้ว

    Explanation help me, but calculations are wrong. Princes is V=1 and square next to prices should be V=0.9.

  • @superdahoho
    @superdahoho 2 ปีที่แล้ว

    11:42
    you lost me there
    You said the state of the next square is 0 that's why it's 0 +
    but then you go ahead and say the value of the square is 1.
    are the state and value of the square different?
    I thought the state was the square?

  • @niklasdamm6900
    @niklasdamm6900 ปีที่แล้ว

    21.11 20:55

  • @asdfasdfuhf
    @asdfasdfuhf 3 ปีที่แล้ว

    Great content, terrible microphone

  • @mpgrewal00
    @mpgrewal00 3 ปีที่แล้ว

    Las Vegas and AI? What an oxymoron.

  • @femkemilene4547
    @femkemilene4547 5 ปีที่แล้ว +1

    Using gaming analogies and sexist trope distracts. Also unnecessary info about how to learn.

  • @gaulvhan2814
    @gaulvhan2814 5 ปีที่แล้ว

    you are great at explaining things, but why use games and rescuing a princess as examples? smh, be more open-minded about your audience