Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 - Model-Free Policy Evaluation

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Lecture 1 | String Theory and M-Theory

New Colour Match Puzzle Challenge with Cola and McDonald’s Avengers Logo - Incredibox Sprunki

พ้นเส้นตาย "ทหารไทย" 18 ธ.ค.หมดเวลา "ว้าแดง" | DAILYNEWSTODAY 18/12/67

Nec Red Rockets Kawasaki vs. LP Bank Ninh Binh - Pool B | Highlights | Club World Champs 2024

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 2 - Given a Model of the World

Stanford Online

มุมมอง 218 571

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 4 ม.ค. 2025

ความคิดเห็น • 18

@pierrecurie 2 ปีที่แล้ว ⁺⁷
25:47
Conjecture: inverse exists if gamma in [0,1), and fails to exist if gamma=1.
Easy to check for 1 or 2 state systems.
@moritzbroesamle4566 ปีที่แล้ว ⁺¹
True, for gamma < 1 the matrix is strictly diagonally dominant, thus invertible
@chongsun7872 3 หลายเดือนก่อน ⁺³
I have one question about the lecture: Dr. Brunskill claimed that "if the process is deterministic, then the return and the value function will be equivalent". But the reward by definition is a random variable, i.e. for the same state, there maybe different rewards. So even the process is deterministic where we know the exact next state, we are still not sure the realized reward for that state. Why then the value and return are equivalent for the deterministic process?
@meetsaiya5007 2 ปีที่แล้ว ⁺³
Can the common or good questions of piazza be put up somewhere to refer to?
@meetsaiya5007 2 ปีที่แล้ว ⁺⁴
About the gammas being in GP has a very good interpretation in finance and I believe it stems from there and is just not mathematical. It does have some mathematical properties though. It's to do with interest which means if we earn 1 now and there's 10% interest, then after 1 year the it is 1.1 which means if after 1 year if I am earning 1, it is equivalent to earning 0.909 now and since interest are always in 10 to 20 25% range ballpark, this gives us rough values of gamma as 0.8 to 0.9 or so. A gamma of 0.5 would mean I would leverage the reward such that it would double in following time step. This is compounded over time and that is how it's a GP. However, this would imply if I have a reward on 1 this year, I can leverage it over following years (collect interest) which seems reasonable to think in terms of learning from experience early on in a sense... However this is my understanding and might be biased..
@muhammadhassanshakeel7544 ปีที่แล้ว
Does anybody understand how did she get to 2nd step of the equation on 1:11:56?
@kaiqizhang6524 ปีที่แล้ว
We dont care about a or a'. Suppposed that BV_k >= BV_j, a_j = a' making the maximum of BV_j. When a_j = a, we get BV_j{a_j=a}
@arpitqw1 ปีที่แล้ว
How return function is different from value function ? How come return will be different from value function when process is not stochastic .( both having sum of reward )
@mohammadhoseinrezaee-d1s ปีที่แล้ว
we said if policy is deterministic we can simplify value function to Vπk(s) = r(s, π(s)) + γXs0∈Sp(s0|s, π(s))Vπk−1(s0) but how we can write max(a) Q(s,a) >= V(s) when policy is deterministic and we can choose just one action?
@mohammadrezanargesi2439 ปีที่แล้ว ⁺¹
Thank you for sharing the contents
@adityanarendra5886 2 ปีที่แล้ว ⁺²
What is the tool that Prof Emma is using for the presentation and annotation, it looks really helpful?
@gravitas8297 2 ปีที่แล้ว ⁺¹
Beamer? I guess
@adityanarendra5886 2 ปีที่แล้ว
@@gravitas8297 Does beamer allow annotation ? I thought it was a latex class for making presentations ? I wanted to know the annotation tool she is using for iPad. That would be really helpful .
@gravitas8297 2 ปีที่แล้ว
@@adityanarendra5886 Err I haven't tried that sorry :(
@Hiro0701_sub ปีที่แล้ว ⁺¹
47:13 Someone just asked what I wanted to! 😂
@John83118 11 หลายเดือนก่อน
I'm under its spell. I had the pleasure of reading something similar, and I was under its spell. "The Art of Saying No: Mastering Boundaries for a Fulfilling Life" by Samuel Dawn
@marciamarquene5753 ปีที่แล้ว
V tú e horário normal e o valor da entrada e o valor e horário normal e o valor da taxa de ontem e o valor e horário normal e
@marciamarquene5753 ปีที่แล้ว
G o resto x ela quiser vir me CP g vi agora só r ela e e horário então só r r viu se ela quiser e te amo e o valor e horário da manhã r viu se e o valor e horário

ต่อไป

เล่นอัตโนมัติ

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 - Model-Free Policy Evaluation

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 - Model-Free Policy Evaluation

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Lecture 1 | String Theory and M-Theory

Lecture 1 | String Theory and M-Theory

พ้นเส้นตาย "ทหารไทย" 18 ธ.ค.หมดเวลา "ว้าแดง" | DAILYNEWSTODAY 18/12/67

พ้นเส้นตาย "ทหารไทย" 18 ธ.ค.หมดเวลา "ว้าแดง" | DAILYNEWSTODAY 18/12/67

Nec Red Rockets Kawasaki vs. LP Bank Ninh Binh - Pool B | Highlights | Club World Champs 2024

Nec Red Rockets Kawasaki vs. LP Bank Ninh Binh - Pool B | Highlights | Club World Champs 2024

Scum Rangers LIVE-021 ขุนให้อ้วน ฟาร์มให้เงียบ

Scum Rangers LIVE-021 ขุนให้อ้วน ฟาร์มให้เงียบ

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 5 - Value Function Approximation

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 5 - Value Function Approximation

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Geoffrey Hinton | Will digital intelligence replace biological intelligence?

Geoffrey Hinton | Will digital intelligence replace biological intelligence?

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

General Relativity Lecture 1

General Relativity Lecture 1

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

ผู้หญิงแต่งงานกับขอทาน แต่กลับถูกดูหมิ่น ในที่สุดชายขเทานก็เผยตัวตย#ละครหวานๆ#ชอบ

ผู้หญิงแต่งงานกับขอทาน แต่กลับถูกดูหมิ่น ในที่สุดชายขเทานก็เผยตัวตย#ละครหวานๆ#ชอบ

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

【หนังพากย์ไทย】ยอดฝีมือสังหารนักโทษ แต่นักโทษเป็นปรมาจารย์กังฟูที่ซ่อนอยู่ เขาจัดการทั้งหมดในทันที

【หนังพากย์ไทย】ยอดฝีมือสังหารนักโทษ แต่นักโทษเป็นปรมาจารย์กังฟูที่ซ่อนอยู่ เขาจัดการทั้งหมดในทันที

🔴LIVE สด! PGC 2024 ศึกชิงแชมป์โลกพับจี Circuit 3 วันที่ 2

🔴LIVE สด! PGC 2024 ศึกชิงแชมป์โลกพับจี Circuit 3 วันที่ 2

มายคราฟ, แต่ ไลค์ = หัวใจ!

มายคราฟ, แต่ ไลค์ = หัวใจ!

Mache leckere Lutscher mit diesem PRO-Gadget! 🚽🍭

Mache leckere Lutscher mit diesem PRO-Gadget! 🚽🍭

Oren helps Durple escape Pinki in a way you wouldn't expect

Oren helps Durple escape Pinki in a way you wouldn't expect

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts