Entry Point AI
Entry Point AI
  • 13
  • 175 357
RLHF & DPO Explained (In Simple Terms!)
Learn how Reinforcement Learning from Human Feedback (RLHF) actually works and why Direct Preference Optimization (DPO) and Kahneman-Tversky Optimization (KTO) are changing the game.
This video doesn't go deep on math. Instead, I provide a high-level overview of each technique to help you make practical decisions about where to focus your time and energy.
0:52 The Idea of Reinforcement Learning
1:55 Reinforcement Learning from Human Feedback (RLHF)
4:21 RLHF in a Nutshell
5:06 RLHF Variations
6:11 Challenges with RLHF
7:02 Direct Preference Optimization (DPO)
7:47 Preferences Dataset Example
8:29 DPO in a Nutshell
9:25 DPO Advantages over RLHF
10:32 Challenges with DPO
10:50 Kahneman-Tversky Optimization (KTO)
11:39 Prospect Theory
13:35 Sigmoid vs Value Function
13:49 KTO Dataset
15:28 KTO in a Nutshell
15:54 Advantages of KTO
18:03 KTO Hyperparameters
These are the three papers referenced in the video:
1. Deep reinforcement learning from human preferences (arxiv.org/abs/1706.03741)
2. Direct Preference Optimization:
Your Language Model is Secretly a Reward Model (arxiv.org/abs/2305.18290)
3. KTO: Model Alignment as Prospect Theoretic Optimization (arxiv.org/abs/2402.01306)
The Huggingface TRL library offers implementations for PPO, DPO, and KTO:
huggingface.co/docs/trl/main/en/kto_trainer
Want to prototype with prompts and supervised fine-tuning? Try Entry Point AI:
www.entrypointai.com/
How about connecting? I'm on LinkedIn:
www.linkedin.com/in/markhennings/
มุมมอง: 2 847

วีดีโอ

Ask GPT-4 in Google Sheets
มุมมอง 1.4K6 หลายเดือนก่อน
Learn an amazing trick to use OpenAI models directly inside Google Sheets. Watch as I transform whole columns of data using an AI prompt in seconds! After this video, you'll be able to do awesome things like write creative copy, standardize your data, or extract specific details from unstructured text. Topics covered: 0:27 Demo of LLM calls directly in Google Sheets 5:33 The custom function in ...
Fine-tuning Datasets with Synthetic Inputs
มุมมอง 3.8K8 หลายเดือนก่อน
👉 Start building your dataset at www.entrypointai.com There are virtually unlimited ways to fine-tune LLMs to improve performance at specific tasks... but where do you get the data from? In this video, I demonstrate one way that you can fine-tune without much data to start with - and use what little data you have to reverse-engineer the inputs required! I show step-by-step how to take a small s...
Large Language Models (LLMs) Explained
มุมมอง 2K10 หลายเดือนก่อน
In this video, I explain how language models generate text, why most of the process is actually deterministic (not random), and how you can shape the probability when selecting a next token from LLMs using parameters like temperature and top p. I cover temperature in-depth and demonstrate with a spreadsheet how different values change the probabilities. Topics: 00:10 Tokens & Why They Matter 03...
LoRA & QLoRA Fine-tuning Explained In-Depth
มุมมอง 50K11 หลายเดือนก่อน
👉 Start fine-tuning at www.entrypointai.com In this video, I dive into how LoRA works vs full-parameter fine-tuning, explain why QLoRA is a step up, and provide an in-depth look at the LoRA-specific hyperparameters: Rank, Alpha, and Dropout. 0:26 - Why We Need Parameter-efficient Fine-tuning 1:32 - Full-parameter Fine-tuning 2:19 - LoRA Explanation 6:29 - What should Rank be? 8:04 - QLoRA and R...
Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use
มุมมอง 102Kปีที่แล้ว
👉 Start fine-tuning at www.entrypointai.com Explore the difference between Prompt Engineering, Retrieval-augmented Generation (RAG), and Fine-tuning in this detailed overview. 01:14 Prompt Engineering RAG 02:50 How Retrieval Augmented Generation Works - Step-by-step 06:23 What is fine-tuning? 08:25 Fine-tuning misconceptions debunked 09:53 Fine-tuning strategies 13:25 Putting it all together 13...
Fine-tuning 101 | Prompt Engineering Conference
มุมมอง 6Kปีที่แล้ว
👉 Start fine-tuning at www.entrypointai.com Intro to fine-tuning LLMs (large language models0 from the Prompt Engineering Conference (2023) Presented by Mark Hennings, founder of Entry Point AI. 00:13 - Part 1: Background Info -How a foundation model is born -Instruct tuning and safety tuning -Unpredictability of raw LLM behavior -Showing LLMs how to apply knowledge -Characteristics of fine-tun...
"I just fine-tuned GPT-3.5 Turbo…" - Here's how
มุมมอง 1.1Kปีที่แล้ว
🎁 Join our Skool community: www.skool.com/entry-point-ai In this video, I'm diving into the power and potential of the newly released GPT-3.5's fine-tuning option. After fine-tuning some of my models, the enhancement in quality is undeniably remarkable. Join me as I: - Demonstrate the model I fine-tuned: Watch as the AI suggests additional items for an e-commerce shopping cart and the rationale...
No-code AI fine-tuning (with Entry Point!)
มุมมอง 1.3Kปีที่แล้ว
👉 Sign up for Entry Point here: www.entrypointai.com Entry Point is a platform for no-code AI fine-tuning, with support for Large Language Models (LLMs) from multiple platforms: OpenAI, AI21, and more. In this video I'll demonstrate the core fine-tuning principles while creating an "eCommerce product recommendation" engine in three steps: 1. First I write ~20 examples by hand 2. Then I expand t...
28 April Update: Playground and Synthesis
มุมมอง 82ปีที่แล้ว
28 April Update: Playground and Synthesis
How to Fine-tune GPT-3 in less than 3 minutes.
มุมมอง 4.4Kปีที่แล้ว
🎁 Join our Skool community: www.skool.com/entry-point-ai Learn how to fine-tune GPT-3 (and other) AI models without writing a single line of python code. In this video I'll show you how to create your own custom AI models out of GPT-3 for specialized use cases that can work better than ChatGPT. For demonstration, I'll be working on a classifier AI model for categorizing keywords from Google Ads...
Can GPT-4 Actually Lead a D&D Campaign? 🤯
มุมมอง 386ปีที่แล้ว
If you want to create / fine-tune your own AI models, check out www.entrypointai.com/
Entry Point Demo 1.0 - Keyword Classifier (AI Model)
มุมมอง 341ปีที่แล้ว
Watch over my shoulder as I fine-tune a model for classifying keywords from Google Ads.