Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Direct Preference Optimization (DPO)

DPO Debate: Is RL needed for RLHF?

🔴LIVE เชียร์สด : อุรุกวัย พบ บราซิล | ดาร์วินvsอลิสซอน โคปา อเมริกา รอบ 8 ทีม

เราโดนหลอก‼️ปืนบั้บเบิ้ลยักษ์ ไม่เหมือนในรูป #bubble #แจ่มใส #jamsaijs #jamsai

แอบเข้าสตู Hi-End ไปหาทีมงานใหม่ จะโดนจับได้ไหม!!?

Aligning LLMs with Direct Preference Optimization

DeepLearningAI

มุมมอง 22 639

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 7 ก.พ. 2024
In this workshop, Lewis Tunstall and Edward Beeching from Hugging Face will discuss a powerful alignment technique called Direct Preference Optimisation (DPO) which was used to train Zephyr (arxiv.org/abs/2310.16944) and is rapidly becoming the de facto method to boost the performance of open chat models.
By the end of this workshop, attendees will:
Understand the steps involved in fine-tuning LLMs for chat applications.
Learn the theory behind Direct Preference Optimisation and how to apply it in practice with the Hugging Face TRL library.
Know what metrics to consider when evaluating chat models.
Take a moment to register for our community forum:
bit.ly/48UIIve
Take a moment to register for our short courses here:
bit.ly/420iXHx
Workshop Notebooks:
Notebook #1:
colab.research.google.com/dri...
Notebook #2:
colab.research.google.com/dri...
Slides:
docs.google.com/presentation/...
About DeepLearning.AI
DeepLearning.AI is an education technology company that is empowering the global workforce to build an AI-powered future through world-class education, hands-on training, and a collaborative community. Take your generative AI skills to the next level with short courses help you learn new skills, tools, and concepts efficiently.
About Hugging Face
Hugging Face is an AI company specializing in natural language processing (NLP) and machine learning, and is known for its open-source contributions and collaborative approach to AI research and development. The company is famous for developing the Transformers library, which offers a wide range of pretrained models and tools for a variety of NLP tasks, making it easier for researchers and developers to implement state-of-the-art AI solutions. Hugging Face also fosters a vibrant community for AI enthusiasts and professionals, providing a platform for sharing models, datasets, and research, which significantly contributes to the advancement of AI technology.
Speakers:
Lewis Tunstall, Machine Learning Engineer, Hugging Face
/ lewis-tunstall
Edward Beeching, Research Scientist, Hugging Face
/ ed-beeching-3553b468
บันเทิง

ความคิดเห็น • 18

@eliporter3980 4 หลายเดือนก่อน ⁺²
I'm learning a lot from these talks, thank you for having them.
@NitinPasumarthy 4 หลายเดือนก่อน ⁺³
The best content I have seen in a while. Enjoyed both the theory and the practical notes from both speakers! Huge thanks dlai for organizing this event
@PritishYuvraj 3 หลายเดือนก่อน ⁺¹
Excellent description between PPO and DPO! Kudos
@vijaybhaskar5333 4 หลายเดือนก่อน ⁺³
Excellent topic. Well explained. One of the best videos on this subject I saw recently. Continue your goodwork😊
@katie-48 4 หลายเดือนก่อน ⁺¹
Great presentation, thank you very much!
@user-rx5pp3hh1x 4 หลายเดือนก่อน ⁺²
cut to the chase - 3:30
questions on DPO - 27:37
practical deep-dive - 30:19
question - 53:32
@jeankunz5986 5 หลายเดือนก่อน ⁺¹
great presentation. Congratulations.
@amortalbeing 5 หลายเดือนก่อน ⁺²
This was amazing thank you everyone.
One thing though, if thats possible, it would be greatly appreciated if you could record in 1080p where the details/text on the slides are visible and easier to consume.
Thanks a lot again
@MatijaGrcic 5 หลายเดือนก่อน ⁺³
Check out notebooks and slides in the description.
@amortalbeing 5 หลายเดือนก่อน
@@MatijaGrcic Thanks a lot, downloaded the slides
@PaulaLeonova 4 หลายเดือนก่อน ⁺³
At 29:40 Lewis mentions an algorithm that requires fewer training samples, what is the name of it? I heard "data", but don't think that is correct. If anyone knows, would you mind replying?
@user-rx5pp3hh1x 4 หลายเดือนก่อน
Possibly, this paper, "Rethinking Data Selection for Supervised Fine-Tuning" arxiv.org/pdf/2402.06094.pdf
@ralphabrooks 4 หลายเดือนก่อน
I am also interested in hearing more about this "data" algorithm. Is there a link to a paper or blog on it?
@austinmw89 4 หลายเดือนก่อน
Curious if you compared SFT on all data vs. training on completions only?
@TheRilwen 4 หลายเดือนก่อน ⁺¹
I'm wondering why simple techniques, such as sample boosting, increasing errors for high ranked examples, or attention layer wouldn't work in place of RLHF. It seems like a very convoluted and inefficient way of doing a simple thing - which convinces me that I'm missing something :-)
@iseminamanim 5 หลายเดือนก่อน
Interested
@MacProUser99876 4 หลายเดือนก่อน
How DPO works under the hood: th-cam.com/video/Ju-pFJNfOfY/w-d-xo.html

ต่อไป

เล่นอัตโนมัติ

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Direct Preference Optimization (DPO)

Direct Preference Optimization (DPO)

DPO Debate: Is RL needed for RLHF?

DPO Debate: Is RL needed for RLHF?

🔴LIVE เชียร์สด : อุรุกวัย พบ บราซิล | ดาร์วินvsอลิสซอน โคปา อเมริกา รอบ 8 ทีม

🔴LIVE เชียร์สด : อุรุกวัย พบ บราซิล | ดาร์วินvsอลิสซอน โคปา อเมริกา รอบ 8 ทีม

เราโดนหลอก‼️ปืนบั้บเบิ้ลยักษ์ ไม่เหมือนในรูป #bubble #แจ่มใส #jamsaijs #jamsai

เราโดนหลอก‼️ปืนบั้บเบิ้ลยักษ์ ไม่เหมือนในรูป #bubble #แจ่มใส #jamsaijs #jamsai

แอบเข้าสตู Hi-End ไปหาทีมงานใหม่ จะโดนจับได้ไหม!!?

แอบเข้าสตู Hi-End ไปหาทีมงานใหม่ จะโดนจับได้ไหม!!?

'ปิงลี่-นินิว' กับภารกิจตามหาโชคชัย 1 ถึง โชคชัย 4 | ถ้าโลกนี้ไม่มี GPS Ep.67

'ปิงลี่-นินิว' กับภารกิจตามหาโชคชัย 1 ถึง โชคชัย 4 | ถ้าโลกนี้ไม่มี GPS Ep.67

The Near Future of AI [Entire Talk] - Andrew Ng (AI Fund)

The Near Future of AI [Entire Talk] - Andrew Ng (AI Fund)

I wish every AI Engineer could watch this.

I wish every AI Engineer could watch this.

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

5 Easy Ways to help LLMs to Reason

5 Easy Ways to help LLMs to Reason

Eliezer Yudkowsky - AI Alignment: Why It's Hard, and Where to Start

Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start

Generative AI Fine Tuning LLM Models Crash Course

Generative AI Fine Tuning LLM Models Crash Course

Stanford CS25: V4 I Aligning Open Language Models

Stanford CS25: V4 I Aligning Open Language Models

เมื่อคุณไม่ชอบอาหารบนเครื่องบิน #พากย์ไทย #albertcancook #patrickzeneli

เมื่อคุณไม่ชอบอาหารบนเครื่องบิน #พากย์ไทย #albertcancook #patrickzeneli

POR QUEEE DIVERTIDA MENTE 2 !!! #SHORTS

POR QUEEE DIVERTIDA MENTE 2 !!! #SHORTS

semua terpesona liat farel polisi ganteng #short #viral

semua terpesona liat farel polisi ganteng #short #viral

แกล้งเป็นลม #คริปตลกๆ #ตลกฮาๆไทย #คลิ๊ปตลก #ตลกๆ #พากย์ตลก #พากย์อีสาน #ฮา #ตลก #พากย์นรก

แกล้งเป็นลม #คริปตลกๆ #ตลกฮาๆไทย #คลิ๊ปตลก #ตลกๆ #พากย์ตลก #พากย์อีสาน #ฮา #ตลก #พากย์นรก

คนที่ตุยได้น่าอายสุดๆ #maimagai

คนที่ตุยได้น่าอายสุดๆ #maimagai

القطة الشجاعة 😭😭🐱 #shorts

القطة الشجاعة 😭😭🐱 #shorts

‘นาย-ใบเฟิร์น’ สิ้นสุดทางแฟน ‘กรรชัย’ เบรก ‘แม่หมู’ ไม่ต้องพูดแล้วฝากข้อคิดชีวิตลูกไม่ใช่ชีวิตของเรา

‘นาย-ใบเฟิร์น’ สิ้นสุดทางแฟน ‘กรรชัย’ เบรก ‘แม่หมู’ ไม่ต้องพูดแล้วฝากข้อคิดชีวิตลูกไม่ใช่ชีวิตของเรา