Self-Rewarding Language Models by Meta AI - Path to Open-Source AGI?

Vision Transformers Need Registers - Fixing a Bug in DINOv2?

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

คู่จิ้นช่องแตก! ”หลิง & ออม“ หวานมาก | “เป้ย” แซ่บซู้ดปาก! เขินมีหนุ่มจีบ | 3 แซ่บ (Full) 28 ก.ค. 67

HIGHLIGHTS : BG PATHUM UNITED 3 - 1 KITCHEE SC | FRIENDLY MATCH

Fast Inference of Mixture-of-Experts Language Models with Offloading

AI Papers Academy

มุมมอง 1 262

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 28 ก.ค. 2024
In this video we review a recent important paper titled: "Fast Inference of Mixture-of-Experts Language Models with Offloading".
Mixture of Experts (MoE) is an important strategy to improve the efficiency of transformer based large language models (LLMs) nowadays.
However, MoE models usually have a large memory footprint since we need to load the weights of all experts. This makes it hard to run MoE models on low tier GPUs.
This paper introduces a method to efficiently run transformer based MoE LLMs on a limited memory environment using offloading techniques. Specifically, the researchers are able to run Mixtral-8x7B on the free-tier version of Google Colab.
In the video, we provide a reminder for how mixture of experts works, and then dive into the offloading method presented in this paper.
-----------------------------------------------------------------------------------------------
Paper page - arxiv.org/abs/2312.17238
Soft MoE - • Soft Mixture of Expert...
Code - github.com/dvmazur/mixtral-of...
Post - aipapersacademy.com/moe-offlo...
-----------------------------------------------------------------------------------------------
✉️ Join the newsletter - aipapersacademy.com/newsletter/
👍 Please like & subscribe if you enjoy this content
We use VideoScribe to edit our videos - tidd.ly/44TZEiX (affiliate)
-----------------------------------------------------------------------------------------------
Chapters:
0:00 Paper Introduction
1:34 Mixture of Experts
3:44 MoE Offloading
10:29 Mixed MoE Quantization
11:13 Inference Speed
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 6

@winterclimber7520 6 หลายเดือนก่อน ⁺⁵
Very exciting work! The resulting speed the paper proposes won't break any land speed records (2-3 tokens per second), but in my experience one of the most productive and practical applications of LLMs is prompting it with multiple choice questions, which only require a single token.
This paper (and provided code!) for GPT3.5 levels of inference running local on consumer hardware is a huge breakthrough, and I'm excited to give it a try!
@jacksonmatysik8007 6 หลายเดือนก่อน ⁺¹
I have been looking for channel like this for ages as I hate reading
@fernandos-bs6544 4 หลายเดือนก่อน
I just found your channel. It is amazing. Congratulations. Your numbers will grow soon, I am sure. Great quality and great content.
@aipapersacademy 4 หลายเดือนก่อน
Thank you 🙏
@PaulSchwarzer-ou9sw 6 หลายเดือนก่อน
Thanks! ❤
@ameynaik2743 3 หลายเดือนก่อน
I believe this applicable only for single request? If you have change of experts, you will most likely have many experts active for various requests. Is my understanding correct? thank you.

ต่อไป

เล่นอัตโนมัติ

Self-Rewarding Language Models by Meta AI - Path to Open-Source AGI?

Self-Rewarding Language Models by Meta AI - Path to Open-Source AGI?

Vision Transformers Need Registers - Fixing a Bug in DINOv2?

Vision Transformers Need Registers - Fixing a Bug in DINOv2?

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

คู่จิ้นช่องแตก! ”หลิง & ออม“ หวานมาก | “เป้ย” แซ่บซู้ดปาก! เขินมีหนุ่มจีบ | 3 แซ่บ (Full) 28 ก.ค. 67

คู่จิ้นช่องแตก! ”หลิง & ออม“ หวานมาก | “เป้ย” แซ่บซู้ดปาก! เขินมีหนุ่มจีบ | 3 แซ่บ (Full) 28 ก.ค. 67

HIGHLIGHTS : BG PATHUM UNITED 3 - 1 KITCHEE SC | FRIENDLY MATCH

HIGHLIGHTS : BG PATHUM UNITED 3 - 1 KITCHEE SC | FRIENDLY MATCH

[TH] 2024 PMWC x EWC Main Tournament Day 1 | PUBG MOBILE WORLD CUP x ESPORTS WORLD CUP

[TH] 2024 PMWC x EWC Main Tournament Day 1 | PUBG MOBILE WORLD CUP x ESPORTS WORLD CUP

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

What Makes Large Language Models Expensive?

What Makes Large Language Models Expensive?

Understanding Mixture of Experts

Understanding Mixture of Experts

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

The Attention Mechanism in Large Language Models

The Attention Mechanism in Large Language Models

Mistral 8x7B Part 1- So What is a Mixture of Experts Model?

Mistral 8x7B Part 1- So What is a Mixture of Experts Model?

Mixture of Experts LLM - MoE explained in simple terms

Mixture of Experts LLM - MoE explained in simple terms

What is Parameter Offloading? - LLM Concepts ( EP 2 ) #llm #ai #artificialintelligence

What is Parameter Offloading? - LLM Concepts ( EP 2 ) #llm #ai #artificialintelligence

AI Hardware, Explained.

AI Hardware, Explained.

ด่วน! เกิดเหตุการณ์จอฟ้า BSOD พร้อมกันทั่วโลก เกิดอะไรขึ้นกับเครื่องคอมพิวเตอร์ Windows

ด่วน! เกิดเหตุการณ์จอฟ้า BSOD พร้อมกันทั่วโลก เกิดอะไรขึ้นกับเครื่องคอมพิวเตอร์ Windows

Intel VS AMD ทีมงานเสียงแตก !!! #Intel #AMD #intelvsamd

Intel VS AMD ทีมงานเสียงแตก !!! #Intel #AMD #intelvsamd

รีวิวหัวแร้ง ราคา 750 ฿ VS 120 ฿ ความเหมือนที่แตกต่าง แกะให้ดู ไขข้อข้องใจของหลายๆคน

รีวิวหัวแร้ง ราคา 750 ฿ VS 120 ฿ ความเหมือนที่แตกต่าง แกะให้ดู ไขข้อข้องใจของหลายๆคน

World’s smallest 4K headset 😎 #tech #vr #technology #virtualreality #insideout2

World’s smallest 4K headset 😎 #tech #vr #technology #virtualreality #insideout2

Samsung Z Flip 6 Durability Test - I CANT BELIEVE THIS WORKED...

Samsung Z Flip 6 Durability Test - I CANT BELIEVE THIS WORKED...

วิธีลงทะเบียนทางรัฐ | ด้วยเลขบัตรประชาชน BenzTech

วิธีลงทะเบียนทางรัฐ | ด้วยเลขบัตรประชาชน BenzTech

มือถือ 3 หมื่น กับวิธีแก้บัคแบบกาวๆ | Red Magic 8s Pro

มือถือ 3 หมื่น กับวิธีแก้บัคแบบกาวๆ | Red Magic 8s Pro

$1 vs $100,000 Slow Motion Camera!

$1 vs $100,000 Slow Motion Camera!