An introduction to Policy Gradient methods - Deep Reinforcement Learning

Deep Q-Learning/Deep Q-Network (DQN) Explained | Python Pytorch Deep Reinforcement Learning

Policy Gradient Theorem Explained - Reinforcement Learning

Players vs Strange Things 😳

เต้นจิงโจ้ จนคนด่า #breakdancing #olympics

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 13 : แมนเชสเตอร์ ยูไนเต็ด พบ เอฟเวอร์ตัน

REINFORCE (Vanilla Policy Gradient VPG) Algorithm Explained | Deep Reinforcement Learning

Johnny Code

มุมมอง 1 679

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 ธ.ค. 2024

ความคิดเห็น • 10

@johnnycode 7 หลายเดือนก่อน ⁺³
Let me know if you want the code walkthru and demo.
@peterhpchen 5 หลายเดือนก่อน
Where is the code?
@johnnycode 5 หลายเดือนก่อน
@@peterhpchen Here you go github.com/johnnycode8/gym_solutions/blob/main/cliff_walking_reinforce.py
@kimiochang 7 หลายเดือนก่อน ⁺²
Thanks for the good work. I am still practicing the FrozenLake DQL+CNN and wonder how to train the model on CUDA as the training time keeps increasing.
@johnnycode 7 หลายเดือนก่อน ⁺¹
Thank you for the continual support, Andy! Here are some general guidelines on using CUDA:
# First, make sure you have CUDA installed properly and is supported by your GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)
# Then, anywhere that uses the network should be sent to CUDA, for example:
DQN(....).to(device)
# Also, anywhere that deals with Tensors should be sent to CUDA, for example:
torch.FloatTensor(...).to(device)
torch.IntTensor(...).to(device)
Note that when you run your code and Pytorch complains that not everything is on the same device, it means you didn't send something to CUDA using "to(device)".
@kimiochang 7 หลายเดือนก่อน
@@johnnycode thanks a lot for the help. And It works!
@johnnycode 6 หลายเดือนก่อน
@@kimiochangThat’s great 👍
@patrykperonski5459 4 หลายเดือนก่อน
One question why it has to be log? It is a bit confusing part to be honest.
@johnnycode 4 หลายเดือนก่อน
This page explains the math that arrives to log(): mcneela.github.io/math/2018/04/18/A-Tutorial-on-the-REINFORCE-Algorithm.html
@nguyenmanh466 4 หลายเดือนก่อน
Check out OpenAI’s SpinningUp, they explained that.

ต่อไป

เล่นอัตโนมัติ

An introduction to Policy Gradient methods - Deep Reinforcement Learning

An introduction to Policy Gradient methods - Deep Reinforcement Learning

Deep Q-Learning/Deep Q-Network (DQN) Explained | Python Pytorch Deep Reinforcement Learning

Deep Q-Learning/Deep Q-Network (DQN) Explained | Python Pytorch Deep Reinforcement Learning

Policy Gradient Theorem Explained - Reinforcement Learning

Policy Gradient Theorem Explained - Reinforcement Learning

Players vs Strange Things 😳

Players vs Strange Things 😳

เต้นจิงโจ้ จนคนด่า #breakdancing #olympics

เต้นจิงโจ้ จนคนด่า #breakdancing #olympics

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 13 : แมนเชสเตอร์ ยูไนเต็ด พบ เอฟเวอร์ตัน

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 13 : แมนเชสเตอร์ ยูไนเต็ด พบ เอฟเวอร์ตัน

เพลง มนต์รักลูกทุ่ง - ไรอัล กาจบัณฑิต | ไรอัลขับขานเพลงครู "ไพบูลย์ บุตรขัน"

เพลง มนต์รักลูกทุ่ง - ไรอัล กาจบัณฑิต | ไรอัลขับขานเพลงครู "ไพบูลย์ บุตรขัน"

REINFORCE: Reinforcement Learning Most Fundamental Algorithm

REINFORCE: Reinforcement Learning Most Fundamental Algorithm

Overview of Deep Reinforcement Learning Methods

Overview of Deep Reinforcement Learning Methods

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods | Reinforcement Learning Part 6

Reinforcement Learning, by the Book

Reinforcement Learning, by the Book

Proximal Policy Optimization | ChatGPT uses this

Proximal Policy Optimization | ChatGPT uses this

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

3363. Find the Maximum Number of Fruits Collected | Biweekly Contest 144 | Leetcode

3363. Find the Maximum Number of Fruits Collected | Biweekly Contest 144 | Leetcode

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

AAAAAAAAA the Crazi Venom (Incredibox Sprunki)

AAAAAAAAA the Crazi Venom (Incredibox Sprunki)

MatiTalk "วันชัย" อ่านอนาคตการเมือง...เพื่อไทยแข็งแรง

MatiTalk "วันชัย" อ่านอนาคตการเมือง...เพื่อไทยแข็งแรง

พูดอีสานทั้งแก๊ง 1 วัน!! ตะลุยบ้านเกิด BOSS YELLOW!!

พูดอีสานทั้งแก๊ง 1 วัน!! ตะลุยบ้านเกิด BOSS YELLOW!!

มายคราฟแต่ทุกที่ที่ผมไปจะเป็น "สไลม์" เหนียวหนึบ!?

มายคราฟแต่ทุกที่ที่ผมไปจะเป็น "สไลม์" เหนียวหนึบ!?

ลิเวอร์พูล 2-0 เรอัล มาดริด | ไฮไลต์ ยูฟ่า แชมเปี้ยนส์ ลีก Champions League 24/25

ลิเวอร์พูล 2-0 เรอัล มาดริด | ไฮไลต์ ยูฟ่า แชมเปี้ยนส์ ลีก Champions League 24/25

小路飞和小丑也太帅了#家庭#搞笑 #funny #小丑 #cosplay

小路飞和小丑也太帅了#家庭#搞笑 #funny #小丑 #cosplay

เอก - เพียงรัก - Semi Final - The Voice Thailand 2024 - 1 Dec 2024

เอก - เพียงรัก - Semi Final - The Voice Thailand 2024 - 1 Dec 2024

Scum Rangers LIVE-011 ชิบหายแอร์ไลน์

Scum Rangers LIVE-011 ชิบหายแอร์ไลน์