Reinforcement Learning: AlphaGo

A. I. Learns to Play Starcraft 2 (Reinforcement Learning)

RLHF & DPO Explained (In Simple Terms!)

마시멜로우 버섯 만들기🍄😂Making marshmallow mushrooms#funny

ผู้หญิงแต่งงานกับขอทาน แต่กลับถูกดูหมิ่น ในที่สุดชายขเทานก็เผยตัวตย#ละครหวานๆ#ชอบ

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

Reinforcement Learning: ChatGPT and RLHF

Graphics in 5 Minutes

มุมมอง 12 427

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ธ.ค. 2024

ความคิดเห็น • 18

@EternityUnknown 6 หลายเดือนก่อน ⁺¹¹
I just binged this playlist at 1 am. Absolutely worth it. You deserve more views.
@dudeguy8864 12 วันที่ผ่านมา
agreed
@colorblindzebra 4 หลายเดือนก่อน ⁺⁴
PLEASE COMEBACK!! You are an amazing theacher!
@tuulymusic3856 8 หลายเดือนก่อน ⁺⁴
Please come back, your videos are great!
@Coder.tahsin 6 หลายเดือนก่อน ⁺³
All of your videos are amazing, please upload more
@ireoluwaTH ปีที่แล้ว ⁺¹
Welcome back!
Hope to see more of these videos..
@HoverAround 7 หลายเดือนก่อน
Joel, excellent explanation and talk! Thank you!
@pegasusbupt ปีที่แล้ว ⁺²
Amazing content! Please keep them coming!
@胡里安-n6m 7 หลายเดือนก่อน ⁺¹
help me a lot, can't wait to see more
@jasonpmorrison ปีที่แล้ว ⁺¹
Super helpful - thank you for this series!
@onhazrat ปีที่แล้ว
🎯 Key Takeaways for quick navigation:
00:00 🤖 Reinforcement learning improves large language models like ChatGPT.
00:25 🃏 Large language models face issues like bias, errors, and quality.
01:11 📊 Training data quality impacts results; removing bad jokes might help.
01:55 🧩 Training on both good and bad jokes improves language models.
02:38 🔄 Language models are policies, reinforcement learning uses policy gradient.
03:08 🎯 Reinforcement Learning from Human Feedback (RLHF) challenges data acquisition.
03:35 🤔 RLHF theory: Language model might already know jokes' boundary.
04:18 🏆 Training a reward network predicts human ratings for model's output.
04:47 🔄 Reward network is a modified language model for predicting ratings.
05:14 📝 Approach: Humans write text, train reward network, refine model with RL.
05:57 ⚖️ Systems convert comparisons to ratings for reward network training.
06:11 😄 RLHF successfully improves language models, including humor.
Made with HARPA AI
@n45a_ หลายเดือนก่อน
ok everything makes sense now, thx
@0xeb- ปีที่แล้ว ⁺¹
Good teaching.
@vamsinadh100 ปีที่แล้ว ⁺¹
You are the Best
@0xeb- ปีที่แล้ว ⁺¹
How long it takes to train a reward network? And how reliable would it be?
@RaulMartinezRME ปีที่แล้ว ⁺¹
Great content!!
@neo4242002 6 หลายเดือนก่อน
Who is this guy? He made all the complexity so simple with his words. Anyone know this gentleman name?
@stayhappy-forever 8 หลายเดือนก่อน ⁺³
come back :(

ต่อไป

เล่นอัตโนมัติ

Reinforcement Learning: AlphaGo

Reinforcement Learning: AlphaGo

A. I. Learns to Play Starcraft 2 (Reinforcement Learning)

A. I. Learns to Play Starcraft 2 (Reinforcement Learning)

RLHF & DPO Explained (In Simple Terms!)

RLHF & DPO Explained (In Simple Terms!)

마시멜로우 버섯 만들기🍄😂Making marshmallow mushrooms#funny

마시멜로우 버섯 만들기🍄😂Making marshmallow mushrooms#funny

ผู้หญิงแต่งงานกับขอทาน แต่กลับถูกดูหมิ่น ในที่สุดชายขเทานก็เผยตัวตย#ละครหวานๆ#ชอบ

ผู้หญิงแต่งงานกับขอทาน แต่กลับถูกดูหมิ่น ในที่สุดชายขเทานก็เผยตัวตย#ละครหวานๆ#ชอบ

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

เจ้าของแทบทรุด บ้านสร้างได้ 3 เดือน พังทรุดตัว เพจดังชี้สาเหตุ ไม่ใช่เกิดจากเสาเข็ม

เจ้าของแทบทรุด บ้านสร้างได้ 3 เดือน พังทรุดตัว เพจดังชี้สาเหตุ ไม่ใช่เกิดจากเสาเข็ม

Large Language Models from scratch

Large Language Models from scratch

Why LLMs Are Going to a Dead End Explained | AGI Lambda

Why LLMs Are Going to a Dead End Explained | AGI Lambda

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Proximal Policy Optimization | ChatGPT uses this

Proximal Policy Optimization | ChatGPT uses this

15min History of Reinforcement Learning and Human Feedback

15min History of Reinforcement Learning and Human Feedback

How I built an AI Teacher with Vector Databases and ChatGPT

How I built an AI Teacher with Vector Databases and ChatGPT

Learn Machine Learning Like a GENIUS and Not Waste Time

Learn Machine Learning Like a GENIUS and Not Waste Time

Fine Tuning Large Language Models with InstructLab

Fine Tuning Large Language Models with InstructLab

How ChatGPT is Trained

How ChatGPT is Trained

หมวกกันน็อค - TaitosmitH |Official MV|

หมวกกันน็อค - TaitosmitH |Official MV|

#WOWxดราม่าคอมเม้นแฟนบอลอาเซียน ตะลึง!! แห่ชื่นชมสปิริตทีมชาติไทย หลังเกมส์พลิกชนะสิงคโปร์ 4-2

#WOWxดราม่าคอมเม้นแฟนบอลอาเซียน ตะลึง!! แห่ชื่นชมสปิริตทีมชาติไทย หลังเกมส์พลิกชนะสิงคโปร์ 4-2

วาทะลูกหนังขอเสนอ"แมนเชสเตอร์ ซิตี้ VS แมนเชสเตอร์ ยูไนเต็ด หลังเกม เรือใบสีฟ้าแพ้ปีศาจแดงคาบ้าน"

วาทะลูกหนังขอเสนอ"แมนเชสเตอร์ ซิตี้ VS แมนเชสเตอร์ ยูไนเต็ด หลังเกม เรือใบสีฟ้าแพ้ปีศาจแดงคาบ้าน"

ตรวจหวยงวดวันที่ 16 ธันวาคม 2567 พร้อมรางวัล N3 รางวัลพิเศษ รางวัล 2 ตัว : Matichon Online

ตรวจหวยงวดวันที่ 16 ธันวาคม 2567 พร้อมรางวัล N3 รางวัลพิเศษ รางวัล 2 ตัว : Matichon Online

#เดอะตุ๊ก !! เจาะเดือด ทีมชาติ ผ่าฟอร์ม !! ทีมชาติไทย มันส์ เปิด สาเหตุ !! ระบบ+แท็คติก

#เดอะตุ๊ก !! เจาะเดือด ทีมชาติ ผ่าฟอร์ม !! ทีมชาติไทย มันส์ เปิด สาเหตุ !! ระบบ+แท็คติก

คริสต์มาสมรณะ | Who Are You EP.7 ( Edwin )

คริสต์มาสมรณะ | Who Are You EP.7 ( Edwin )

ส่องฟอร์ม อาหมัด ดิยัลโล่ เล่นโคตรดี | แมนซิตี้ 1-2 แมนยู

ส่องฟอร์ม อาหมัด ดิยัลโล่ เล่นโคตรดี | แมนซิตี้ 1-2 แมนยู

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 16 : แมนเชสเตอร์ ซิตี้ พบ แมนเชสเตอร์ ยูไนเต็ด

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 16 : แมนเชสเตอร์ ซิตี้ พบ แมนเชสเตอร์ ยูไนเต็ด