Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Proximal Policy Optimization Explained

Pineapple pizza 🍕@Lionfield @albert_cancook #pizza #italian #food #funny #cooking #viralvideo

วิญญาณสาวเอ่ยปาก บอกรักพระขณะธุดงค์ | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

อยู่ไม่ได้ก็ต้องอยู่ให้ไหว「 Official MV 」- JEEP x Dome Jaruwat

Deep RL Bootcamp Lecture 4A: Policy Gradients

AI Prism

มุมมอง 61 285

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 3 ธ.ค. 2024

ความคิดเห็น • 43

@naeemajilforoushan5784 7 หลายเดือนก่อน ⁺⁴
After 5 years still, the lecture is a great video, thank you a lot
@bhargav975 6 ปีที่แล้ว ⁺³⁵
This is the best lecture I have seen on policy gradient methods. Thanks a lot.
@jony7779 4 ปีที่แล้ว ⁺¹⁷
Every time I forget how policy gradients work exactly, I just come back here and watch starting at 9:30
@andreasf.3930 4 ปีที่แล้ว ⁺³
And every time you visited this video, you forgot where to start watching. Thats why you posted this comment. Smart guy!
@auggiewilliams3565 5 ปีที่แล้ว ⁺¹
I must say that in more than 6 months, this is by far the best lecture/ material I have come across that was able to make me understand what policy gradient method actually is. I really praise this work. :) Thank you.
@ericsteinberger4101 6 ปีที่แล้ว ⁺¹⁰
Amazing lecture! Love how Pieter explains the math. super easy to understand.
@marloncajamarca2793 6 ปีที่แล้ว ⁺³
Great Lecture!!!! Pieter's explanations are just a gem!
@ashishj2358 3 ปีที่แล้ว
Best lecture on Policy Gradients hands down. Has covered some worth noting superficial details of many papers as well.
@johnnylima1337 6 ปีที่แล้ว ⁺⁵
It's such a good lecture, I'm stopping to ask myself why it was so easy to cover such significant information with full understanding
@Рамиль-ц5о 4 ปีที่แล้ว
Very good lecture about policy gradient method. I have looked through a lot of articles and was understanding almost everything, but your derivation explanation is really the best. It just opened my eyes and showed the whole picture. Thank you very much!!
@synthetic_paul 4 ปีที่แล้ว ⁺⁵
Honestly I can’t keep up without seeing what he’s pointing at. Gotta pause and search around the screen each time he says “this over here”
@akarshrastogi3682 4 ปีที่แล้ว ⁺²
Exactly. "This over here" has got to be the most uttered phrase in this lecture. So frustrating.
@bobsmithy3103 2 ปีที่แล้ว
amazing work. super understandable, concise and information dense.
@norabelrose198 2 ปีที่แล้ว
The explanation of the derivation of policy gradient is really nice and understandable here
@dustinandrews89019 7 ปีที่แล้ว ⁺¹
I got a lot out of this lecture in particular. Thank you.
@JadtheProdigy 5 ปีที่แล้ว
best lecturer in series
@DhruvMetha 3 ปีที่แล้ว
Wow, this is beautiful!
@keqiaoli4617 4 ปีที่แล้ว ⁺¹
why a good "R" would increase the probability of path??? Please help me
@faizanintech1909 6 ปีที่แล้ว
Awesome instructor.
@JyoPari 5 ปีที่แล้ว ⁺¹
Instead of having a baseline, why not make your reward function be negative for undesired scenarios and positive for good ones? Great lecture!
@ethanjyx 5 ปีที่แล้ว
wow damn this is so well explained and the last video is very entertaining.
@sharmakartikeya 11 หลายเดือนก่อน
I might be missing a simple concept here but how are we increasing/decreasing the grad log probability of the actions using the gradient of U(theta)? I get that positive return for a trajectory will make the gradient of U positive and so theta will be increased in favour of those trajectories but how is it increasing grad log prob?
@ishfaqhaque1993 5 ปีที่แล้ว
23:20- Gradient of expectation is expectation of gradient "under mild assumptions". What are those assumptions?
@joaogui1 4 ปีที่แล้ว ⁺²
math.stackexchange.com/questions/12909/will-moving-differentiation-from-inside-to-outside-an-integral-change-the-resu
@muratcan__22 4 ปีที่แล้ว ⁺³
nice but hard to follow without knowing what "this" refers to. I hope my guesses were right :)
@isupeene 4 ปีที่แล้ว ⁺²
The guy in the background at 51:30
@nathanbittner8307 7 ปีที่แล้ว
excellent lecture. Thank you for sharing.
@biggeraaron 5 ปีที่แล้ว ⁺¹
Where can i buy his T-shirt?
@emilterman6924 5 ปีที่แล้ว
It would be nice to see what laboratories they had (what exercises)
@Procuste34iOSh 4 ปีที่แล้ว
dont know if ur still interested, but the labs are on the bootcamp website
@suertem1 5 ปีที่แล้ว
Great lecture, thanks
@richardteubner7364 7 ปีที่แล้ว ⁺¹
1:11 why are DQNs and friends Dynamic Programming Methods? I mean the neural network works as functions approximator to satisfy Bellmans eqn. , but still Backprop is the workhorse. In my opinion DQNs are much more similar to PG methods than to Bellman Updates??! And another issue with RL Landscape slide is where the heck are model based RL algos?? This slide should be renamed to model free RL landscape.
@karthik-ex4dm 6 ปีที่แล้ว
PG is awesome!!!
Doesn't depend on environment Dynamics really?? Wow
All the pain and stress just goes away when we see our algorithms working😇😇
@ProfessionalTycoons 6 ปีที่แล้ว
great talk!
@Diablothegeek 7 ปีที่แล้ว
Awesome!! Thanks
@arpitgarg5172 5 ปีที่แล้ว ⁺¹¹
If you can't explain it like Pieter Abbeel or Andrew NG then you don't understand it well enough.
@elzilcho222 6 ปีที่แล้ว ⁺¹
could you train a robot for 2 weeks in the real world then use those trained parameters to optimize a virtual environment? You know.. making the virtual environment very close to the real world?
@OfficialYunas 6 ปีที่แล้ว ⁺¹
Of course you could. It's the opposite of what OpenAI does when they train a model in a virtual environment and deploy it in reality.
@soutrikband 5 ปีที่แล้ว
Real world is very complicated with model uncertainties, friction, wear and tear and what have you...
Simulators can come close , but we cannot expect them to fully mimic real world phenomena.
@shaz7163 7 ปีที่แล้ว
very nice :)
@piyushjaininventor 6 ปีที่แล้ว
Can you share ppt??
@luxorska5143 5 ปีที่แล้ว ⁺³
You can find all the slides and the other lectures here:
sites.google.com/view/deep-rl-bootcamp/lectures
@MarkoTintor 4 ปีที่แล้ว
... you can use "a", and the math will be the same. :)

ต่อไป

เล่นอัตโนมัติ

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Proximal Policy Optimization Explained

Proximal Policy Optimization Explained

Pineapple pizza 🍕@Lionfield @albert_cancook #pizza #italian #food #funny #cooking #viralvideo

Pineapple pizza 🍕@Lionfield @albert_cancook #pizza #italian #food #funny #cooking #viralvideo

วิญญาณสาวเอ่ยปาก บอกรักพระขณะธุดงค์ | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

วิญญาณสาวเอ่ยปาก บอกรักพระขณะธุดงค์ | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

อยู่ไม่ได้ก็ต้องอยู่ให้ไหว「 Official MV 」- JEEP x Dome Jaruwat

อยู่ไม่ได้ก็ต้องอยู่ให้ไหว「 Official MV 」- JEEP x Dome Jaruwat

เมื่อ2ท่ามารวมกัน💀ดูให้ทัน🔥 #freefire #ฟีฟาย #shorts

เมื่อ2ท่ามารวมกัน💀ดูให้ทัน🔥 #freefire #ฟีฟาย #shorts

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

L6 Model-based RL (Foundations of Deep RL Series)

L6 Model-based RL (Foundations of Deep RL Series)

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

An introduction to Policy Gradient methods - Deep Reinforcement Learning

An introduction to Policy Gradient methods - Deep Reinforcement Learning

Reinforcement Learning, by the Book

Reinforcement Learning, by the Book

CS885 Lecture 15b: Proximal Policy Optimization (Presenter: Ruifan Yu)

CS885 Lecture 15b: Proximal Policy Optimization (Presenter: Ruifan Yu)

Deep Reinforcement Learning: Neural Networks for Learning Control Laws

Deep Reinforcement Learning: Neural Networks for Learning Control Laws

'กองทัพว้า'ประชิดชายแดน วิกฤติความมั่นคงไทย ? | World Stories

'กองทัพว้า'ประชิดชายแดน วิกฤติความมั่นคงไทย ? | World Stories

วิญญาณสาวเอ่ยปาก บอกรักพระขณะธุดงค์ | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

วิญญาณสาวเอ่ยปาก บอกรักพระขณะธุดงค์ | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

10-man Bayern bow out of DFB cup | FC Bayern vs. Bayer 04 Leverkusen 0-1 | DFB-Pokal Highlights

10-man Bayern bow out of DFB cup | FC Bayern vs. Bayer 04 Leverkusen 0-1 | DFB-Pokal Highlights

REAL MADRID 2 - 0 GETAFE CF I RESUMEN LALIGA EA SPORTS

REAL MADRID 2 - 0 GETAFE CF I RESUMEN LALIGA EA SPORTS

เลือกขวดน้ำ ที่จะเอาไปโรงเรียน!

เลือกขวดน้ำ ที่จะเอาไปโรงเรียน!

RoV : ผมท้ามาลีสวยมาก 1-1 ชนะได้ให้ 5000 !

RoV : ผมท้ามาลีสวยมาก 1-1 ชนะได้ให้ 5000 !

ติดเกาะกลางทะเล คุณจะเลือกอะไรไปด้วย? เลือกเลย!

ติดเกาะกลางทะเล คุณจะเลือกอะไรไปด้วย? เลือกเลย!

Don’t Choose The Wrong Box 😱

Don’t Choose The Wrong Box 😱