- 13
- 202 088
Entry Point AI
United States
เข้าร่วมเมื่อ 22 มี.ค. 2023
The modern platform for fine-tuning large language models.
RLHF & DPO Explained (In Simple Terms!)
Learn how Reinforcement Learning from Human Feedback (RLHF) actually works and why Direct Preference Optimization (DPO) and Kahneman-Tversky Optimization (KTO) are changing the game.
This video doesn't go deep on math. Instead, I provide a high-level overview of each technique to help you make practical decisions about where to focus your time and energy.
0:52 The Idea of Reinforcement Learning
1:55 Reinforcement Learning from Human Feedback (RLHF)
4:21 RLHF in a Nutshell
5:06 RLHF Variations
6:11 Challenges with RLHF
7:02 Direct Preference Optimization (DPO)
7:47 Preferences Dataset Example
8:29 DPO in a Nutshell
9:25 DPO Advantages over RLHF
10:32 Challenges with DPO
10:50 Kahneman-Tversky Optimization (KTO)
11:39 Prospect Theory
13:35 Sigmoid vs Value Function
13:49 KTO Dataset
15:28 KTO in a Nutshell
15:54 Advantages of KTO
18:03 KTO Hyperparameters
These are the three papers referenced in the video:
1. Deep reinforcement learning from human preferences (arxiv.org/abs/1706.03741)
2. Direct Preference Optimization:
Your Language Model is Secretly a Reward Model (arxiv.org/abs/2305.18290)
3. KTO: Model Alignment as Prospect Theoretic Optimization (arxiv.org/abs/2402.01306)
The Huggingface TRL library offers implementations for PPO, DPO, and KTO:
huggingface.co/docs/trl/main/en/kto_trainer
Want to prototype with prompts and supervised fine-tuning? Try Entry Point AI:
www.entrypointai.com/
How about connecting? I'm on LinkedIn:
www.linkedin.com/in/markhennings/
This video doesn't go deep on math. Instead, I provide a high-level overview of each technique to help you make practical decisions about where to focus your time and energy.
0:52 The Idea of Reinforcement Learning
1:55 Reinforcement Learning from Human Feedback (RLHF)
4:21 RLHF in a Nutshell
5:06 RLHF Variations
6:11 Challenges with RLHF
7:02 Direct Preference Optimization (DPO)
7:47 Preferences Dataset Example
8:29 DPO in a Nutshell
9:25 DPO Advantages over RLHF
10:32 Challenges with DPO
10:50 Kahneman-Tversky Optimization (KTO)
11:39 Prospect Theory
13:35 Sigmoid vs Value Function
13:49 KTO Dataset
15:28 KTO in a Nutshell
15:54 Advantages of KTO
18:03 KTO Hyperparameters
These are the three papers referenced in the video:
1. Deep reinforcement learning from human preferences (arxiv.org/abs/1706.03741)
2. Direct Preference Optimization:
Your Language Model is Secretly a Reward Model (arxiv.org/abs/2305.18290)
3. KTO: Model Alignment as Prospect Theoretic Optimization (arxiv.org/abs/2402.01306)
The Huggingface TRL library offers implementations for PPO, DPO, and KTO:
huggingface.co/docs/trl/main/en/kto_trainer
Want to prototype with prompts and supervised fine-tuning? Try Entry Point AI:
www.entrypointai.com/
How about connecting? I'm on LinkedIn:
www.linkedin.com/in/markhennings/
มุมมอง: 5 494
วีดีโอ
Ask GPT-4 in Google Sheets
มุมมอง 1.8K9 หลายเดือนก่อน
Learn an amazing trick to use OpenAI models directly inside Google Sheets. Watch as I transform whole columns of data using an AI prompt in seconds! After this video, you'll be able to do awesome things like write creative copy, standardize your data, or extract specific details from unstructured text. Topics covered: 0:27 Demo of LLM calls directly in Google Sheets 5:33 The custom function in ...
Fine-tuning Datasets with Synthetic Inputs
มุมมอง 4.2K10 หลายเดือนก่อน
👉 Start building your dataset at www.entrypointai.com There are virtually unlimited ways to fine-tune LLMs to improve performance at specific tasks... but where do you get the data from? In this video, I demonstrate one way that you can fine-tune without much data to start with - and use what little data you have to reverse-engineer the inputs required! I show step-by-step how to take a small s...
How Large Language Models (LLMs) Actually Work
มุมมอง 2.4Kปีที่แล้ว
In this video, I explain how language models generate text, why most of the process is actually deterministic (not random), and how you can shape the probability when selecting a next token from LLMs using parameters like temperature and top p. I cover temperature in-depth and demonstrate with a spreadsheet how different values change the probabilities. Topics: 00:10 Tokens & Why They Matter 03...
LoRA & QLoRA Fine-tuning Explained In-Depth
มุมมอง 60Kปีที่แล้ว
👉 Start fine-tuning at www.entrypointai.com In this video, I dive into how LoRA works vs full-parameter fine-tuning, explain why QLoRA is a step up, and provide an in-depth look at the LoRA-specific hyperparameters: Rank, Alpha, and Dropout. 0:26 - Why We Need Parameter-efficient Fine-tuning 1:32 - Full-parameter Fine-tuning 2:19 - LoRA Explanation 6:29 - What should Rank be? 8:04 - QLoRA and R...
Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use
มุมมอง 114Kปีที่แล้ว
Explore the difference between Prompt Engineering, Retrieval-augmented Generation (RAG), and Fine-tuning in this detailed overview. 01:14 Prompt Engineering RAG 02:50 How Retrieval Augmented Generation Works - Step-by-step 06:23 What is fine-tuning? 08:25 Fine-tuning misconceptions debunked 09:53 Fine-tuning strategies 13:25 Putting it all together 13:44 Final review and comparison of technique...
Fine-tuning 101 | Prompt Engineering Conference
มุมมอง 7Kปีที่แล้ว
👉 Start fine-tuning at www.entrypointai.com Intro to fine-tuning LLMs (large language models0 from the Prompt Engineering Conference (2023) Presented by Mark Hennings, founder of Entry Point AI. 00:13 - Part 1: Background Info -How a foundation model is born -Instruct tuning and safety tuning -Unpredictability of raw LLM behavior -Showing LLMs how to apply knowledge -Characteristics of fine-tun...
"I just fine-tuned GPT-3.5 Turbo…" - Here's how
มุมมอง 1.2Kปีที่แล้ว
🎁 Join our Skool community: www.skool.com/entry-point-ai In this video, I'm diving into the power and potential of the newly released GPT-3.5's fine-tuning option. After fine-tuning some of my models, the enhancement in quality is undeniably remarkable. Join me as I: - Demonstrate the model I fine-tuned: Watch as the AI suggests additional items for an e-commerce shopping cart and the rationale...
No-code AI fine-tuning (with Entry Point!)
มุมมอง 1.4Kปีที่แล้ว
👉 Sign up for Entry Point here: www.entrypointai.com Entry Point is a platform for no-code AI fine-tuning, with support for Large Language Models (LLMs) from multiple platforms: OpenAI, AI21, and more. In this video I'll demonstrate the core fine-tuning principles while creating an "eCommerce product recommendation" engine in three steps: 1. First I write ~20 examples by hand 2. Then I expand t...
28 April Update: Playground and Synthesis
มุมมอง 84ปีที่แล้ว
28 April Update: Playground and Synthesis
How to Fine-tune GPT-3 in less than 3 minutes.
มุมมอง 4.4Kปีที่แล้ว
🎁 Join our Skool community: www.skool.com/entry-point-ai Learn how to fine-tune GPT-3 (and other) AI models without writing a single line of python code. In this video I'll show you how to create your own custom AI models out of GPT-3 for specialized use cases that can work better than ChatGPT. For demonstration, I'll be working on a classifier AI model for categorizing keywords from Google Ads...
Can GPT-4 Actually Lead a D&D Campaign? 🤯
มุมมอง 395ปีที่แล้ว
If you want to create / fine-tune your own AI models, check out www.entrypointai.com/
Entry Point Demo 1.0 - Keyword Classifier (AI Model)
มุมมอง 365ปีที่แล้ว
Watch over my shoulder as I fine-tune a model for classifying keywords from Google Ads.
What a brilliant channel! thank you thank you! looking forward to all your videos!
Really good explanation! Thank you<3
Thanks. I found this very informative. I studied Prospect Theory and behavioral decision theory in grad school and so found it fascinating how K&T's work has been adapted to LLMs.
Super useful. Thanks.
Thx a lot! Great explanations!
8:06 Qlora
Great explanations! I’d like to see how this looks in a PyTorch training loop too
good
As s beginner of AI, this is very useful to me! Thank you!
just perfect explaination !
Thank you
Great video
Very useful, thank you!
Such an incredibly clear presentation! Thank you
Excellent explanation thank a lot!
Fact that you gave a concrete examples really helped me go through this! Thank you for the great video
Great insights on AI techniques! With Web3NS, you can fine-tune AI agents on YourName.Web3, combining RAG and prompts to create smarter, decentralized responses tailored to your needs. 🚀
Thanks for this clear explanation about the topic! Your way of relating back to research papers is very interesting and helpful!
I think there is a mistake your explanation. It uses higher precision while finetuning, but it does **not** recover "back to normal". It stays in the quantized dataformat.
Nice
Such a simple way to explain. Thank you.
can you share the presentation document
Loved the explanation. Despite knowing these terms well, I was curious to see how it would be explained and I am glad that I watched this video
Incredibly high quality content, thank you.
Thanks a ton!
Martin Shirley Jackson Kenneth Allen Mary
hello can you help me
bradley cooper in limitless tf
Melisa Branch
This is the best detailed video and nicest explanation on youtube right now. I do think your channel will grow because you are doing an EXCELENT job. Thank you man.
This is amazing! How do you deal with request limits? I'm set to 3RPM for all models, so I'm unable to drag and drop this into a sheet as it will only do 3 at a time.
Good question. I don't have a good answer right now except to find yourself a higher limit :) 3rpm is really really low and you should be able to find even a free service or wrapper that offers more. Check out Groq or Together.ai
damn bro thanks, I'm new to fine-tuning, needed to migrate my rag model to the ragtag model, and your video was very clear and helped me a lot 😊
Sick, glad to hear it!
so good!!
What's the name of the paper you referenced in the video?
Here's LoRA: arxiv.org/abs/2106.09685 and QLoRA: arxiv.org/abs/2305.14314
It may be a stupid qüestion, is it possible to download the fine tunned model at entrypoint? Thankyou. Grate video! I came trying to see if it is possible to download it but I understood more about the website and it's grate.
It depends on what service you use to fine-tune, the proprietary ones like OpenAI don't let you download it. Others like Replicate do.
Perfect differentiations. Thankyou
blud why your eyes like that
Best video I've seen on the topic. Thank you!
Very useful! Marvelous clear explanation with the right amount of detail about a subject that’s worth understanding
I've been binge watching LLM-related videos and most have been regurgitation of docs or (probably) GPT/AI generated fluff pieces. This video clearly explained several concepts that I was trying to wrap my head around. Great job!
Similar comments. He actually explained it properly.
Amazing for struggling students. Love from Korea😂
Thank you!
very well explained, thanks :)
Awesome. Thanks
Thanks!, an amazing explanation!, +1 Sub.
"Great video with an amazing explanation! Thank you for sharing."
Thanks. Is it possible to make a webscraping from a given url in a column?
This is a task better suited for another tool. I'd check out apify.com/
QLORA let's me train on a 4070ti with only 12gb vram. Though I can't go over 7b model
Thisa saved me. Thank you. Keep doing this :)
Great explanation, best that I've seen