Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Why Does Diffusion Work Better than Auto-Regression?

Stanford CS25: V4 I Hyung Won Chung of OpenAI

เดือดจัด! ISUZU MU-X MC25 เกรดใหม่ RS ตัวท้อป 1.759 ลบ. พี่มิน พาชมยาวๆ พร้อมข้อมูลเจาะลึกที่สุด!

[MPD직캠] 베이비몬스터 치키타 직캠 4K 'LIKE THAT' (BABYMONSTER CHIQUITA FanCam) | @MCOUNTDOWN_2024.6.13

เมื่อ OHANA อยาก : ย้ายออฟฟิศ

Stanford CS25: V4 I Demystifying Mixtral of Experts

Stanford Online

มุมมอง 4 679

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 พ.ค. 2024
April 25, 2024
Speaker: Albert Jiang, Mistral AI / University of Cambridge
Demystifying Mixtral of Experts
In this talk I will introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combines their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. I will go into the architectural details and analyse the expert routing decisions made by the model.
About the speaker:
Albert Jiang is an AI scientist at Mistral AI, and a final-year PhD student at the computer science department of Cambridge University. He works on language model pretraining and reasoning at Mistral AI, and language models for mathematics at Cambridge.
More about the course can be found here: web.stanford.edu/class/cs25/
View the entire CS25 Transformers United playlist: • Stanford CS25 - Transf...

ความคิดเห็น • 4

@marknuggets 28 วันที่ผ่านมา ⁺¹
Cool format, Stanford quickly becomes my favorite blogger lol
@user-uy4rx3hs3x 22 วันที่ผ่านมา ⁺¹
where to get slides
@acoustic_boii 28 วันที่ผ่านมา ⁺¹
Dear Stanford online recently I have completed product management course from Stanford online but i haven't got the certificate help me please how will I get the certificate
@Ethan_here230 27 วันที่ผ่านมา ⁺¹
Wait u will get it
- Ethan from Stanford

ต่อไป

เล่นอัตโนมัติ

Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Stanford CS25: V4 I Hyung Won Chung of OpenAI

Stanford CS25: V4 I Hyung Won Chung of OpenAI

เดือดจัด! ISUZU MU-X MC25 เกรดใหม่ RS ตัวท้อป 1.759 ลบ. พี่มิน พาชมยาวๆ พร้อมข้อมูลเจาะลึกที่สุด!

เดือดจัด! ISUZU MU-X MC25 เกรดใหม่ RS ตัวท้อป 1.759 ลบ. พี่มิน พาชมยาวๆ พร้อมข้อมูลเจาะลึกที่สุด!

[MPD직캠] 베이비몬스터 치키타 직캠 4K 'LIKE THAT' (BABYMONSTER CHIQUITA FanCam) | @MCOUNTDOWN_2024.6.13

[MPD직캠] 베이비몬스터 치키타 직캠 4K 'LIKE THAT' (BABYMONSTER CHIQUITA FanCam) | @MCOUNTDOWN_2024.6.13

เมื่อ OHANA อยาก : ย้ายออฟฟิศ

เมื่อ OHANA อยาก : ย้ายออฟฟิศ

ฟังสดเดอะโกสเรดิโอ 15/6/2567 เรื่องเล่าผีเดอะโกส

ฟังสดเดอะโกสเรดิโอ 15/6/2567 เรื่องเล่าผีเดอะโกส

Darío Gil: The future of AI is open

Darío Gil: The future of AI is open

Joscha at Microsoft

Joscha at Microsoft

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

GraphRAG: LLM-Derived Knowledge Graphs for RAG

GraphRAG: LLM-Derived Knowledge Graphs for RAG

This is What Limits Current LLMs

This is What Limits Current LLMs

10 weird algorithms

10 weird algorithms

A Path Towards Autonomous Machine Intelligence with Dr. Yann LeCun

A Path Towards Autonomous Machine Intelligence with Dr. Yann LeCun

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

ฟังสดเดอะโกสเรดิโอ 15/6/2567 เรื่องเล่าผีเดอะโกส

ฟังสดเดอะโกสเรดิโอ 15/6/2567 เรื่องเล่าผีเดอะโกส

พ่อขี้งกเก็บลูกตาผีสิง โดนผีตามทวงคืนฮามาก! #พ่อขี้งก #เยลลี่ลูกตา #ผีหลอก #shorts

พ่อขี้งกเก็บลูกตาผีสิง โดนผีตามทวงคืนฮามาก! #พ่อขี้งก #เยลลี่ลูกตา #ผีหลอก #shorts

รสมือเมีย SS2 #1 แกงรัญจวนของหอบุพชาติ ลองทำครั้งแรกกินได้มั้ย ? ไม่อยากจะเชื่อ ? นึกว่าจะยากกว่านี้

รสมือเมีย SS2 #1 แกงรัญจวนของหอบุพชาติ ลองทำครั้งแรกกินได้มั้ย ? ไม่อยากจะเชื่อ ? นึกว่าจะยากกว่านี้

🔴LIVE เชียร์สด นักตบลูกยางสาวไทย : โปแลนด์ พบ ไทย | วอลเลย์บอลหญิงเนชันส์ลีก 2024

🔴LIVE เชียร์สด นักตบลูกยางสาวไทย : โปแลนด์ พบ ไทย | วอลเลย์บอลหญิงเนชันส์ลีก 2024

มวยมันส์วันศุกร์ 14/06/2024

มวยมันส์วันศุกร์ 14/06/2024

Cool tool to keep toothpicks safely 😍

Cool tool to keep toothpicks safely 😍

พี่แป้ง @zbingz พูดคำนี้เมื่อไหร่ เอาทองไปเลยยย!!! งานนี้มีเหวอ!? 🟡✨ #ramune #พี่แป้ง #zbingz

พี่แป้ง @zbingz พูดคำนี้เมื่อไหร่ เอาทองไปเลยยย!!! งานนี้มีเหวอ!? 🟡✨ #ramune #พี่แป้ง #zbingz

FOOTBALL WITH PLAY BUTTONS ▶️❤️ #roadto100million

FOOTBALL WITH PLAY BUTTONS ▶️❤️ #roadto100million