Reformer: The Efficient Transformer

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Reinforcement Learning with sparse rewards

วิญญาณสาวเอ่ยปาก บอกรักพระขณะธุดงค์ | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

เต้นจิงโจ้ จนคนด่า #breakdancing #olympics

ร้องเพลงสั่งข้าว Ver.DAY ONE - PUN | Feat @Kunti9

Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions

Yannic Kilcher

มุมมอง 10 047

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 3 ธ.ค. 2024

ความคิดเห็น • 31

@whatsinthepapers6112 5 ปีที่แล้ว ⁺¹⁹
Not going to lie - was fooled up until magnetic chess board! Can't put anything past Schmidhuber
@herp_derpingson 5 ปีที่แล้ว ⁺²⁸
Academics now have to use meme knowledge and tactics, to get their papers noticed. What a time to be alive.
@ronen300 3 ปีที่แล้ว ⁺³
One of the funniest 3 minutes in the field ! I was seriously laughing out loud 😂
@michael-nef 5 ปีที่แล้ว ⁺¹⁵
starting strong, upside down characters in an academic paper. high teir memer
@michael-nef 5 ปีที่แล้ว ⁺⁴
@Dmitry Akimov Lighten up a bit, these people just want recognition for their work and using catchy titles and more light-hearted introductions draws attention. It's not really their fault when it's what they're incentivized to do, something something reward-action.
@herp_derpingson 5 ปีที่แล้ว
@dmitry I dont think its going to happen. There are so many research papers, if you want to get noticed, you need to stand out.
@michael-nef 5 ปีที่แล้ว ⁺³
@Dmitry Akimov ok boomer
@CyberneticOrganism01 2 ปีที่แล้ว ⁺¹
interesting new perspective on how to do RL ☺️
@CosmiaNebula 4 ปีที่แล้ว ⁺⁶
skip to 4:08 if you don't want memes
@richardwebb797 4 ปีที่แล้ว ⁺¹
If you have 2 actions A and B, and you explore / train an input of desired reward 0 to produce action A, how does that help you do the right thing with an input desired reward 1 (select action B)?
@YannicKilcher 4 ปีที่แล้ว
I guess ideally you would learn both, or at least recongize that you now want a different reward, so you should probably do a different action
@richardwebb797 4 ปีที่แล้ว
@@YannicKilcher possible to explain in more concrete terms? The idea is to sample actions better than randomly, but seems hand-wavy to say optimizing a probability distribution given one input will make the output distrib for another input good. Then again I guess that's the exactly what a neural net tries to do
@softerseltzer 3 ปีที่แล้ว
Thank you for the video!
One thing I don't understand though is why does the first paper says that you must use RNN's for non-deterministic environments, yet in the experiments paper, they just stack a few frames for the VizDoom example without any RNN's.
@foobar1231 5 ปีที่แล้ว ⁺²
Sorry, if something wrong, I'm not a specialist in RL.
It is a kind of dynamic programming: agent remembers its previous experience (command) and acts according to observation and experience. Experience is from the episodes (positive and negative, they are like palps). The longer an episodes (more steps), the bigger the horizon. So, calculate the mean reward from episodes and demand a little bit more (on one standard deviation more). What does it mean (to demand more)? As I understood, remain and develop only successful episodes further and cut negative episodes (palps).
@quickdudley 4 ปีที่แล้ว ⁺¹
Let's call the agent f, the observations s, the reward r, the demand d, and actions a. At each step of experience generation a = f(s,d). Then later once the reward is known f is updated such that f(s,r) is pulled towards a.
@robosergTV 5 ปีที่แล้ว ⁺¹
what a great video, thanks!
@NanachiinAbyss 5 ปีที่แล้ว ⁺¹
Can't you do the same by simply adding some logic to the function where the actions are chosen?
If you have a Network that outputs expected values you can just choose actions that have the expected value match with what you want.
@YannicKilcher 5 ปีที่แล้ว
The value function has a hard coded horizon (until the end of the episode), where as UDRL can deal with any horizon.
@justinlloyd3 ปีที่แล้ว
during the first few minutes I am like "hmm I don't think that's gonna work" LOL
@scottmiller2591 5 ปีที่แล้ว ⁺⁷
My cursor, hovering, hovering over the downvote icon - "This guy totally neither read nor understood the paper..." Finally, he says "Just kidding!" and actually reviews the paper.
@YannicKilcher 5 ปีที่แล้ว ⁺⁴
Gotcha 😉
@DeepGamingAI 5 ปีที่แล้ว ⁺⁴
Pronounced "Lara"?
@snippletrap 4 ปีที่แล้ว ⁺³
Negative 5 billion billion trillion is a pretty bad reward.
@jonathanballoch 3 ปีที่แล้ว
This is just a generalization of goal-conditioned imitation learning, no?
@patf9770 3 ปีที่แล้ว
Or maybe that's just a special case of ⅂ꓤ ;)
@ambujmittal6824 5 ปีที่แล้ว
Hi, can you do a video on Capsule networks also? Thank you :)
Btw, I love your videos.
@DanieleMarchei 5 ปีที่แล้ว ⁺²
he already did it ^^
th-cam.com/video/nXGHJTtFYRU/w-d-xo.html

ต่อไป

เล่นอัตโนมัติ

Reformer: The Efficient Transformer

Reformer: The Efficient Transformer

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Reinforcement Learning with sparse rewards

Reinforcement Learning with sparse rewards

วิญญาณสาวเอ่ยปาก บอกรักพระขณะธุดงค์ | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

วิญญาณสาวเอ่ยปาก บอกรักพระขณะธุดงค์ | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

เต้นจิงโจ้ จนคนด่า #breakdancing #olympics

เต้นจิงโจ้ จนคนด่า #breakdancing #olympics

ร้องเพลงสั่งข้าว Ver.DAY ONE - PUN | Feat @Kunti9

ร้องเพลงสั่งข้าว Ver.DAY ONE - PUN | Feat @Kunti9

เลือกสวนน้ำ ที่คุณอยากไป!

เลือกสวนน้ำ ที่คุณอยากไป!

Curiosity-driven Exploration by Self-supervised Prediction

Curiosity-driven Exploration by Self-supervised Prediction

Fast reinforcement learning with generalized policy updates (Paper Explained)

Fast reinforcement learning with generalized policy updates (Paper Explained)

Upside-Down Reinforcement Learning with Jürgen Schmidhuber - #357

Upside-Down Reinforcement Learning with Jürgen Schmidhuber - #357

Training AI Without Writing A Reward Function, with Reward Modelling

Training AI Without Writing A Reward Function, with Reward Modelling

Yann LeCun: Why RL is overrated | Lex Fridman Podcast Clips

Yann LeCun: Why RL is overrated | Lex Fridman Podcast Clips

Dear Game Developers, Stop Messing This Up!

Dear Game Developers, Stop Messing This Up!

Reinforcement Learning, Fast and Slow

Reinforcement Learning, Fast and Slow

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

ReBeL - Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Explained)

ReBeL - Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Explained)

My lovely daughter arranged for me, a security guard, to marry a female CEO.

My lovely daughter arranged for me, a security guard, to marry a female CEO.

ความฉลาดที่ซ่อนอยู่ในตลับเมตรที่คุณอาจไม่เคยรู้มาก่อน #ช่างแบงค์DIY #เทคนิค #เล่าเด้อ

ความฉลาดที่ซ่อนอยู่ในตลับเมตรที่คุณอาจไม่เคยรู้มาก่อน #ช่างแบงค์DIY #เทคนิค #เล่าเด้อ

สาวเมืองเพชร - ธีเดช ทองอภิชาติ [Official MV]

สาวเมืองเพชร - ธีเดช ทองอภิชาติ [Official MV]

นี่คือเรื่องราวสุดหลอน ของสตีฟที่บิดเบี้ยว !

นี่คือเรื่องราวสุดหลอน ของสตีฟที่บิดเบี้ยว !

เลือกสวนน้ำ ที่คุณอยากไป!

เลือกสวนน้ำ ที่คุณอยากไป!

อยู่ไม่ได้ก็ต้องอยู่ให้ไหว「 Official MV 」- JEEP x Dome Jaruwat

อยู่ไม่ได้ก็ต้องอยู่ให้ไหว「 Official MV 」- JEEP x Dome Jaruwat

😂I Was Almost Scared! He Was Performing? #funny #trickingcombo

😂I Was Almost Scared! He Was Performing? #funny #trickingcombo

แฟนบ่ว่าบ้อ - มนต์แคน แก่นคูน【OFFICIAL MV】

แฟนบ่ว่าบ้อ - มนต์แคน แก่นคูน【OFFICIAL MV】