Self-directed Synthetic Dialogues (and other recent synth data)

DPO Debate: Is RL needed for RLHF?

At the Edge of Robotic Applications | Chiara Civardi & Petr Aubrecht | Conf42 IoT 2024

Players vs Trophies 🤯

#อึ้ง!เหลือจะเชื่อ!ไทยพลิกนรกดับสิงคโปร์คาบ้าน ทะลุเข้ารอบรองชนะเลิศ! คารวะอิชิอิโคตรการเปลี่ยนแปลง!

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร

An update on DPO vs PPO for LLM alignment

Nathan Lambert

มุมมอง 1 828

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 19 ธ.ค. 2024

ความคิดเห็น • 7

@LinetteScuderi 4 หลายเดือนก่อน
Офигенно близкая игра! Очень кайфово смотреть такие адреналиновые заносы!
@natolambert 5 หลายเดือนก่อน ⁺²
Models, datasets, etc: huggingface.co/collections/allenai/tulu-v25-suite-66676520fd578080e126f618
@sumanthbalaji1768 4 หลายเดือนก่อน
Hey Nathan, your research seems to defend PPO over DPO but the most recent large models from llama3.1 and nemotron 4 DONT make use of PPO. They just make use of DPO with rejection sampling. In fact llama 3.1 paper chooses DPO only because of ease of compute.
What are your thoughts on this?
Is PPO more relevant for small to medium sized LLMs?
Can the scale of large LLMs with DPO (and clever rejection sampling) be enough?
@natolambert 4 หลายเดือนก่อน
@@sumanthbalaji1768 will write an update on this soon on www.interconnects.ai/ :)
@sumanthbalaji1768 4 หลายเดือนก่อน
@@natolambert lovely, thanks
@666WolfWere 4 หลายเดือนก่อน
THX! :D
@420_gunna 5 หลายเดือนก่อน
"White Rice Research" 🍚🔍👁

ต่อไป

เล่นอัตโนมัติ

Self-directed Synthetic Dialogues (and other recent synth data)

Self-directed Synthetic Dialogues (and other recent synth data)

DPO Debate: Is RL needed for RLHF?

DPO Debate: Is RL needed for RLHF?

At the Edge of Robotic Applications | Chiara Civardi & Petr Aubrecht | Conf42 IoT 2024

At the Edge of Robotic Applications | Chiara Civardi & Petr Aubrecht | Conf42 IoT 2024

Players vs Trophies 🤯

Players vs Trophies 🤯

#อึ้ง!เหลือจะเชื่อ!ไทยพลิกนรกดับสิงคโปร์คาบ้าน ทะลุเข้ารอบรองชนะเลิศ! คารวะอิชิอิโคตรการเปลี่ยนแปลง!

#อึ้ง!เหลือจะเชื่อ!ไทยพลิกนรกดับสิงคโปร์คาบ้าน ทะลุเข้ารอบรองชนะเลิศ! คารวะอิชิอิโคตรการเปลี่ยนแปลง!

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร

กินขนมมั้ยจ้ะน้อง หนมน้า😝

กินขนมมั้ยจ้ะน้อง หนมน้า😝

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

RLHF & DPO Explained (In Simple Terms!)

RLHF & DPO Explained (In Simple Terms!)

Evaluating LLM-based Applications

Evaluating LLM-based Applications

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

15min History of Reinforcement Learning and Human Feedback

15min History of Reinforcement Learning and Human Feedback

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

Stanford CS25: V4 I Aligning Open Language Models

Stanford CS25: V4 I Aligning Open Language Models

What is RAG? (Retrieval Augmented Generation)

What is RAG? (Retrieval Augmented Generation)

How to Evaluate LLM Performance for Domain-Specific Use Cases

How to Evaluate LLM Performance for Domain-Specific Use Cases

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

PiXXiE - Pick A Card | OFFICIAL M/V

PiXXiE - Pick A Card | OFFICIAL M/V

Players vs Trophies 🤯

Players vs Trophies 🤯

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

#นายกแพทองธาร ลงพื้นที่มอบถุงยังชีพ บริเวณ ซ.พัฒนาการคูขวาง ๑๐ (ถ.ท่าโพธิ์) จ.นครศรีธรรมราช

#นายกแพทองธาร ลงพื้นที่มอบถุงยังชีพ บริเวณ ซ.พัฒนาการคูขวาง ๑๐ (ถ.ท่าโพธิ์) จ.นครศรีธรรมราช

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์

หนูขอไปด้วย #แม่สุซูกัส #ตลก #shorts

หนูขอไปด้วย #แม่สุซูกัส #ตลก #shorts