Stanford CS25: V1 I DeepMind's Perceiver and Perceiver IO: new data family architecture

Stanford CS25: V3 I Retrieval Augmented Language Models

Mixture of Experts LLM - MoE explained in simple terms

Why do you feel something's wrong? Who secretly kissed me # shorts # Couple funny

ถ่ายทอดสด l ฟุตซอลชิงแชมป์อาเซียน 2024 l รอบชิงอันดับที่ 3 l ออสเตรเลีย v ไทย

[LIVE] : ONE ลุมพินี 86 | คู่เอก "คมเพชร vs ชาติพยัคฆ์"

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

Stanford Online

มุมมอง 31 270

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 พ.ย. 2024

ความคิดเห็น • 10

@zinyang8213 10 หลายเดือนก่อน ⁺¹³
Good job drinking from your cup without muting yourself.
@pesky_mousquito 6 หลายเดือนก่อน ⁺⁶
MUTE YOUR MIKE when sipping coffee
@gemini_537 4 หลายเดือนก่อน
Gemini 1.5 Pro: This video is about scaling transformers through sparsity. In the video, Erwin and Barry discuss a new approach called switch transformers, which is a simplified mixture of experts variant along with some other improved training and fine-tuning techniques.
The main points are summarized below:
* **Motivation for Sparse Transformers**: Large models perform better, but training them is computationally expensive. Sparse transformers address this challenge by applying different weights to different inputs, resulting in less computation needed.
* **Switch Transformers**: This is a new approach to sparse transformers that replaces some feed-forward layers with a switch layer. The switch layer routes the input to different experts (sub-networks), and only the output from the most probable expert is used.
* **Training Sparse Transformers**: The authors propose three techniques for improving the training of sparse models:
* Selective precision: Trains the models in lower precision formats, which are faster to compute.
* Initialization tricks and training tricks: These allow the models to be trained more stably, especially as the models grow in size.
* Starting from a known good sparse model (mixture of experts) and slowly expanding to more complex architectures.
* **Properties of Sparse Transformers**: The authors find that sparse transformers can have similar pre-training perplexity to dense models, but perform better on knowledge-heavy tasks. However, they can underperform on reasoning-heavy metrics.
* **Fine-tuning Sparse Transformers**: The authors show that sparse models can perform well on downstream tasks when the flops (floating-point operations) and sparsity are scaled appropriately.
* **Multilingual Training**: Sparse transformers are particularly useful for multilingual training, where experts can potentially specialize across languages.
* **Distillation**: The authors propose a technique for distilling a sparse model down to a smaller dense model. This can be useful for reducing the number of parameters needed to serve the model.
Overall, switch transformers are a promising approach to scaling transformers through sparsity. They can achieve good performance while reducing computational costs.
@dougb70 2 ปีที่แล้ว ⁺¹⁴
haha, I love the coffee slurping.
@ucalyptus2 7 หลายเดือนก่อน
would love if slides are available, tried to find it on their website but no luck :(
@TheBartBarton 2 ปีที่แล้ว ⁺³
Barret Z is all over this topic. Barret I hope you’re correct, I’m betting on you here.
@dougb70 2 ปีที่แล้ว ⁺¹
20:10 plasticity
@karanbirchahal3268 ปีที่แล้ว
Wow
@SaulMarian-y6h 2 หลายเดือนก่อน
😂😂😂😂
@JohnMiller-wn6gz 2 ปีที่แล้ว
pr໐๓໐Ş๓ 🤷

ต่อไป

เล่นอัตโนมัติ

Stanford CS25: V1 I DeepMind's Perceiver and Perceiver IO: new data family architecture

Stanford CS25: V1 I DeepMind's Perceiver and Perceiver IO: new data family architecture

Stanford CS25: V3 I Retrieval Augmented Language Models

Stanford CS25: V3 I Retrieval Augmented Language Models

Mixture of Experts LLM - MoE explained in simple terms

Mixture of Experts LLM - MoE explained in simple terms

Why do you feel something's wrong? Who secretly kissed me # shorts # Couple funny

Why do you feel something's wrong? Who secretly kissed me # shorts # Couple funny

ถ่ายทอดสด l ฟุตซอลชิงแชมป์อาเซียน 2024 l รอบชิงอันดับที่ 3 l ออสเตรเลีย v ไทย

ถ่ายทอดสด l ฟุตซอลชิงแชมป์อาเซียน 2024 l รอบชิงอันดับที่ 3 l ออสเตรเลีย v ไทย

[LIVE] : ONE ลุมพินี 86 | คู่เอก "คมเพชร vs ชาติพยัคฆ์"

[LIVE] : ONE ลุมพินี 86 | คู่เอก "คมเพชร vs ชาติพยัคฆ์"

（พากย์ไทย）ผู้พิทักษ์หมัดเทวดา 2 The Thousand Faces of Dunshu 2 | แอคชั่น แฟนตาซี |

（พากย์ไทย）ผู้พิทักษ์หมัดเทวดา 2 The Thousand Faces of Dunshu 2 | แอคชั่น แฟนตาซี |

Trends in Deep Learning Hardware: Bill Dally (NVIDIA)

Trends in Deep Learning Hardware: Bill Dally (NVIDIA)

FlashAttention - Tri Dao | Stanford MLSys #67

FlashAttention - Tri Dao | Stanford MLSys #67

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Understanding Mixture of Experts

Understanding Mixture of Experts

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Research Paper Deep Dive - The Sparsely-Gated Mixture-of-Experts (MoE)

Research Paper Deep Dive - The Sparsely-Gated Mixture-of-Experts (MoE)

Stanford CS25: V4 I Demystifying Mixtral of Experts

Stanford CS25: V4 I Demystifying Mixtral of Experts

Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning

Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning

What is Mixture of Experts?

What is Mixture of Experts?

Thần cước Superlek đã huỷ diệt các đối thủ của mình như thế nào

Thần cước Superlek đã huỷ diệt các đối thủ của mình như thế nào

แม่หยัวศรีสุดาจันทร์ สตรีผู้เขย่าบัลลังก์อยุธยา | โลกวิวัฒน์ Podcast EP.65

แม่หยัวศรีสุดาจันทร์ สตรีผู้เขย่าบัลลังก์อยุธยา | โลกวิวัฒน์ Podcast EP.65

（พากย์ไทย）ผู้พิทักษ์หมัดเทวดา 2 The Thousand Faces of Dunshu 2 | แอคชั่น แฟนตาซี |

（พากย์ไทย）ผู้พิทักษ์หมัดเทวดา 2 The Thousand Faces of Dunshu 2 | แอคชั่น แฟนตาซี |

มายคราฟ แต่ คุณต้องเลือก ชีวิต หรือ ความตาย!!! #minecraft #พี่เก้า #มายคราฟ

มายคราฟ แต่ คุณต้องเลือก ชีวิต หรือ ความตาย!!! #minecraft #พี่เก้า #มายคราฟ

[UNCUT] The Loyal Pin ปิ่นภักดิ์ EP.15 (2/4)

[UNCUT] The Loyal Pin ปิ่นภักดิ์ EP.15 (2/4)

[Live] : ONE 169 วันนี้!! "อนาโตลี vs อูมาร์"

[Live] : ONE 169 วันนี้!! "อนาโตลี vs อูมาร์"

‘จ่าฝูงทิ้งเรือ 5 แต้ม’ ! ดาร์วินยิงสวย ซาลาห์อย่างดุ ‘หงส์แดงดับวิลล่า 2-0

‘จ่าฝูงทิ้งเรือ 5 แต้ม’ ! ดาร์วินยิงสวย ซาลาห์อย่างดุ ‘หงส์แดงดับวิลล่า 2-0

I Built a ROLLERCOASTER In My House!

I Built a ROLLERCOASTER In My House!