RL CH5 - Temporal Difference (TD) Learning (based on Montecarlo and dynamic programming)

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Build an Mario AI Model with Python | Gaming Reinforcement Learning

ลบความเชื่อเรื่องสุขภาพที่คนไทยเข้าใจผิด ของหวาน มัน เค็ม หมอก็กิน! | WOODY FM

แอบแฟนในห้องลับประตูวิเศษ เดินเข้าแล้วหายตัว แกล้งจนแฟนร้องไห้

เปิดคลิปเสียง 'สจ.โต้ง' คุย 'นาย ส.' ปมขัดแย้งการเมืองท้องถิ่น ก่อนถูกยิงดับคาบ้าน 'สุนทร'

RL CH4 - Monte-Carlo Methods on Reinforcement Learning

Saeed Saeedvand

มุมมอง 10 516

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 11 ธ.ค. 2024

ความคิดเห็น • 23

@UPGadgetscom 9 หลายเดือนก่อน ⁺⁴
At 40min40sec, there is an error in the calculation of V(S1). It should be -3.5 NOT -4.5. Because (-4-3/2) = 3.5.
@SaeedSaeedvand 9 หลายเดือนก่อน ⁺²
Yes, in newer versins of slides are correccted already, consider checking values calculations, there can be some typo, or changes of values that havent updated the results is possible. Thanks however.
@Ujjayanroy 6 หลายเดือนก่อน
that's why i pay for a youtube subscription, this content was exactly what i needed
@andresariaslondono7003 ปีที่แล้ว ⁺¹
Very good and detail explanation !!! Thank you
@gulamsarwar7502 9 หลายเดือนก่อน ⁺¹
@Saeed Saeedvand would you pls share the PPT pls ??
@adershm3510 ปีที่แล้ว ⁺¹
Nice lectures. Thank you for the example. I was wondering how the trajectory would change if the policy is stochastic.
@SaeedSaeedvand ปีที่แล้ว
Yes, you can have a stochastic policy.
In stochastic policy, π gives probabilities of taking each possible action in that state you have π(a∣s). So if consider the policy as stochastic, we can say that given a state s, the policy π assigns probabilities to each possible action rather than always picking the same action. Therefore, other actions (not argmax) also will have a chance to be run by policy although they are bad.
@adershm3510 ปีที่แล้ว ⁺¹
@@SaeedSaeedvand Thank you
@gulamsarwar7502 9 หลายเดือนก่อน
@Saeed would you pls share the PPT URL pls
@asifkhankhosa ปีที่แล้ว ⁺²
Very good explanation
@KaranBhanushali-gt6ts ปีที่แล้ว ⁺¹
tum bohot mast kaam karta hai maqsood bhai 🫂
@SURAJITPAL-j2f 2 หลายเดือนก่อน
Hi Sir, In Monte Carlo Exploring Starts example , if we know that the robot is starting from s20, why are we generating episodes randomly from s7 or s8, we can start generating episodes from s20?
@abdullah.montasheri 8 หลายเดือนก่อน ⁺¹
best explanation on youtube
@gulamsarwar7502 9 หลายเดือนก่อน
@SaeedSaeedvand
would you pls share the PPT link pls ??
@lizhenhuang7053 ปีที่แล้ว ⁺¹
good example for learning
@effortlessjapanese123 ปีที่แล้ว
how do you determine each episode is 6 steps long?
@SaeedSaeedvand ปีที่แล้ว ⁺¹
It is only an example here in the slide to show the process (6 is developer choice and considered small here). The number of episodes should be defined depending on the problem. It must be enough (usually at least two or three times larger than if the agent can do it in an optimal way) to allow the agent to explore and, of course, be able to reach the objective.
@MohamedHassan-vy3wi 5 หลายเดือนก่อน
Thank you for the great content
@effortlessjapanese123 ปีที่แล้ว ⁺¹
this is golden!
@camelacml 7 หลายเดือนก่อน
Hi Sir, for (s1, L, r1), shouldn't that be 50?
@SaeedSaeedvand 6 หลายเดือนก่อน
I am not sure you are mentioning which slide. However, there are some updates in the slides after the videos, I tried to note them under the video, since some examples have been changed some final values still might show previous example results which have not been updated on the slides. So consider solution approach first
@razarizvi7085 ปีที่แล้ว
aby english toh bollna seekh le bhai
@amirhossein1108 10 หลายเดือนก่อน ⁺²
very nice explanation

ต่อไป

เล่นอัตโนมัติ

RL CH5 - Temporal Difference (TD) Learning (based on Montecarlo and dynamic programming)

RL CH5 - Temporal Difference (TD) Learning (based on Montecarlo and dynamic programming)

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Build an Mario AI Model with Python | Gaming Reinforcement Learning

ลบความเชื่อเรื่องสุขภาพที่คนไทยเข้าใจผิด ของหวาน มัน เค็ม หมอก็กิน! | WOODY FM

ลบความเชื่อเรื่องสุขภาพที่คนไทยเข้าใจผิด ของหวาน มัน เค็ม หมอก็กิน! | WOODY FM

แอบแฟนในห้องลับประตูวิเศษ เดินเข้าแล้วหายตัว แกล้งจนแฟนร้องไห้

แอบแฟนในห้องลับประตูวิเศษ เดินเข้าแล้วหายตัว แกล้งจนแฟนร้องไห้

เปิดคลิปเสียง 'สจ.โต้ง' คุย 'นาย ส.' ปมขัดแย้งการเมืองท้องถิ่น ก่อนถูกยิงดับคาบ้าน 'สุนทร'

เปิดคลิปเสียง 'สจ.โต้ง' คุย 'นาย ส.' ปมขัดแย้งการเมืองท้องถิ่น ก่อนถูกยิงดับคาบ้าน 'สุนทร'

Support each other🤝

Support each other🤝

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods | Reinforcement Learning Part 6

Introduction to Monte Carlo Methods

Introduction to Monte Carlo Methods

Monte Carlo in Reinforcement Learning

Monte Carlo in Reinforcement Learning

6. Monte Carlo Simulation

6. Monte Carlo Simulation

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

A Simple Solution for Really Hard Problems: Monte Carlo Simulation

A Simple Solution for Really Hard Problems: Monte Carlo Simulation

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

Monte Carlo Methods

Monte Carlo Methods

Don't Learn Machine Learning, Instead learn this!

Don't Learn Machine Learning, Instead learn this!

Prank vs Prank #shorts

Prank vs Prank #shorts

The evil clown used a prank to trick little Harley Quinn #clown #angel

The evil clown used a prank to trick little Harley Quinn #clown #angel

เพราะทุกวินาที คือการบรรจุกระสุน #hererm #เกม #gaming

เพราะทุกวินาที คือการบรรจุกระสุน #hererm #เกม #gaming

กินแหลก 6 ร้านเด็ด ก๋วยเตี๋ยวป๊อกป๊อกพัทยา 12 ชั่วโมง BANKII 4K

กินแหลก 6 ร้านเด็ด ก๋วยเตี๋ยวป๊อกป๊อกพัทยา 12 ชั่วโมง BANKII 4K

สาวเก็บขอทานเป็นสามี โดนดูถูก แต่ปรับเปลี่ยนแล้วกลับซ่อนความเป็นมหาเศรษฐีไม่ได้

สาวเก็บขอทานเป็นสามี โดนดูถูก แต่ปรับเปลี่ยนแล้วกลับซ่อนความเป็นมหาเศรษฐีไม่ได้

ทุกที่ทุกเวลา #ไมค์จิ้ม #ไมค์ใหญ่มาก 😂 🎤 #แมทธิว #matthew #DailyDeanes

ทุกที่ทุกเวลา #ไมค์จิ้ม #ไมค์ใหญ่มาก 😂 🎤 #แมทธิว #matthew #DailyDeanes

ILLSLICK X YOUNGOHM - We're The same [Official Video]

ILLSLICK X YOUNGOHM - We're The same [Official Video]

NISAMANEE - TADA (ท้าดา) [ OFFICIAL MV ]

NISAMANEE - TADA (ท้าดา) [ OFFICIAL MV ]