NeurIPS 2020 Tutorial on Offline RL: Part 2

Offline Reinforcement Learning: BayLearn 2021 Keynote Talk

Offline Reinforcement Learning

ไม่น่าเชื่อว่าเธอจะฉลาดขนาดนี้ #negi #challenges

LIVE : Indonesia vs Japan | AFC Asian Qualifiers™ - Road to 26 (Round 3) | 15.11.24

F.HERO Ft. JSPKK x ลำไย ไหทองคำ x M-PEE - ไม่สนิทบิดหมด (Thai Riders Anthem) [Official MV]

NeurIPS 2020 Tutorial on Offline RL: Part 1

RAIL

มุมมอง 14 266

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 16 พ.ย. 2024

ความคิดเห็น • 5

@flipthecointwice 2 ปีที่แล้ว
Can you please share code example for data driven RL, including data preparation
@jeyasheelarakkinimj6534 2 ปีที่แล้ว
is there a code python available, I have a data set
@dermitdembrot3091 3 ปีที่แล้ว
The equation at 17:40 is hard to understand. The LHS seems to weirdly discount the gradient at states that are later in the trajectory. while the RHS doesn't. Any indication why that happens?
@ruizhenliu9544 3 ปีที่แล้ว
1) Because of Wald's identity, we can directly take the summand and the decay factor outside.
2) the randomness of the trajectory of the LHS is induced by $\pi(a|s$ and the initial state distribution $\mu_0(s)$, so is joint distribution of the state marginal $d^\pi(s)$ and the policy $\pi(a|s)$ in the RHS, where $d^\pi(s) := (1-\gamma)\sum^{\infty}_0 P^\pi(s|s_t)$
@dermitdembrot3091 3 ปีที่แล้ว
@@ruizhenliu9544 thanks for your reply! Maybe you meant to write $d^\pi(s) := (1-\gamma)\sum^{\infty}_{t=0} } \gamma^t P^\pi(s_t=s)$ (note the added $\gamma^t$)? That would answer my question by saying that the discounting of future states can be interpreted probabilistically (with a probability $1-\gamma$ of termination after each time-step) and therefore incorporated into the state distribution.
Maybe that is what happens here. However, I think what is usually done in practice is to sample $d^\pi(s)$ by choosing states from the collected experience uniformly at random, not discounting according to the time-step at which they were observed.

ต่อไป

เล่นอัตโนมัติ

NeurIPS 2020 Tutorial on Offline RL: Part 2

NeurIPS 2020 Tutorial on Offline RL: Part 2

Offline Reinforcement Learning: BayLearn 2021 Keynote Talk

Offline Reinforcement Learning: BayLearn 2021 Keynote Talk

Offline Reinforcement Learning

Offline Reinforcement Learning

ไม่น่าเชื่อว่าเธอจะฉลาดขนาดนี้ #negi #challenges

ไม่น่าเชื่อว่าเธอจะฉลาดขนาดนี้ #negi #challenges

LIVE : Indonesia vs Japan | AFC Asian Qualifiers™ - Road to 26 (Round 3) | 15.11.24

LIVE : Indonesia vs Japan | AFC Asian Qualifiers™ - Road to 26 (Round 3) | 15.11.24

F.HERO Ft. JSPKK x ลำไย ไหทองคำ x M-PEE - ไม่สนิทบิดหมด (Thai Riders Anthem) [Official MV]

F.HERO Ft. JSPKK x ลำไย ไหทองคำ x M-PEE - ไม่สนิทบิดหมด (Thai Riders Anthem) [Official MV]

มีรายงาน มติ ศาลปกครองสูงสุดยกคำร้อง พล.ต.อ.สุรเชชษฐ์ หักพาลกลับตร.ปมถูกคำสั่งให้ออกจากราชการไว้ก่อน

มีรายงาน มติ ศาลปกครองสูงสุดยกคำร้อง พล.ต.อ.สุรเชชษฐ์ หักพาลกลับตร.ปมถูกคำสั่งให้ออกจากราชการไว้ก่อน

NeurIPS 2020 Tutorial: Deep Implicit Layers

NeurIPS 2020 Tutorial: Deep Implicit Layers

Ilya Sutskever: OpenAI Meta-Learning and Self-Play | MIT Artificial General Intelligence (AGI)

Ilya Sutskever: OpenAI Meta-Learning and Self-Play | MIT Artificial General Intelligence (AGI)

Reinforcement Learning: Machine Learning Meets Control Theory

Reinforcement Learning: Machine Learning Meets Control Theory

Imitation learning vs. offline reinforcement learning

Imitation learning vs. offline reinforcement learning

Decision Transformer: Reinforcement Learning via Sequence Modeling (Research Paper Explained)

Decision Transformer: Reinforcement Learning via Sequence Modeling (Research Paper Explained)

Keynote - Offline reinforcement learning

Keynote - Offline reinforcement learning

MIT 6.S191 (2023): Reinforcement Learning

MIT 6.S191 (2023): Reinforcement Learning

Reinforcement Learning Pretraining for Reinforcement Learning Finetuning

Reinforcement Learning Pretraining for Reinforcement Learning Finetuning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

RoV : รีวิว Satoru Gojo ในแรงค์ Supreme !

RoV : รีวิว Satoru Gojo ในแรงค์ Supreme !

Jake Paul vs. Mike Tyson FIGHT HIGHLIGHTS 🥊 | ESPN Ringside

Jake Paul vs. Mike Tyson FIGHT HIGHLIGHTS 🥊 | ESPN Ringside

消防避险训练，消防员用“水盾”逼退烈火！这是训练，也是他们可能面对的日常。致敬！#熱門 #中国

消防避险训练，消防员用“水盾”逼退烈火！这是训练，也是他们可能面对的日常。致敬！#熱門 #中国

มีรายงาน มติ ศาลปกครองสูงสุดยกคำร้อง พล.ต.อ.สุรเชชษฐ์ หักพาลกลับตร.ปมถูกคำสั่งให้ออกจากราชการไว้ก่อน

มีรายงาน มติ ศาลปกครองสูงสุดยกคำร้อง พล.ต.อ.สุรเชชษฐ์ หักพาลกลับตร.ปมถูกคำสั่งให้ออกจากราชการไว้ก่อน

สุขสันต์วันลอยกระทงนะคะ 🤪 #พี่กุ๊กกุ๊ก

สุขสันต์วันลอยกระทงนะคะ 🤪 #พี่กุ๊กกุ๊ก

รักข้ามดาวของคุณลุงร้านขายของ? #เบ็นเท็น #การ์ตูน #เล่าเรื่อง

รักข้ามดาวของคุณลุงร้านขายของ? #เบ็นเท็น #การ์ตูน #เล่าเรื่อง

Incredibox Sprunki vs Inside Out 2 - Which team will win? #shorts #animation

Incredibox Sprunki vs Inside Out 2 - Which team will win? #shorts #animation

SARAN - ห้องที่ว่างเปล่า (Official MV)

SARAN - ห้องที่ว่างเปล่า (Official MV)