Common Sense as Dark Matter - Yejin Choi | Stanford MLSys #78

OPT-175B: LLM Development Lifecycle & Challenges | Susan Zhang

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

🔴 LIVE : ด่วน!DSIลุยจับ "สามารถ" จนมุมเชียงใหม่ | DAILYNEWS TODAY 25/11/67

ถูกจัดฉากสร้างเรื่อง ตกหลุมพรางมารศาสนา! | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

บุกจับ"สามารถ-แม่" คดีฟอกเงิน โยงดิไอคอน | ข่าวเที่ยงช่องวัน | สำนักข่าววันนิวส์

Open Pretrained Transformers - Susan Zhang | Stanford MLSys #77

Stanford MLSys Seminars

มุมมอง 18 240

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 พ.ย. 2024

ความคิดเห็น • 23

@aixueer4ever ปีที่แล้ว ⁺¹⁵
This is the best video on big language model training! Covered all details, pitfalls & tweaks. Thanks for sharing!
@grantyu4693 ปีที่แล้ว ⁺³
The best practice advice for training LLMs from scratch. Thanks for sharing!
@beagle989 ปีที่แล้ว
I was not expecting this level of excruciating pain and suffering. What a nice window into the reality of engineering solutions when we don't understand what we really ought to do yet
@beagle989 ปีที่แล้ว ⁺³
this is incredible, it's the wild west out there
makes me feel better about my home brew models :)
@nowithinkyouknowyourewrong8675 ปีที่แล้ว ⁺⁸
appreciate the openness
@stasbekman8852 ปีที่แล้ว ⁺¹⁰
You did it! Amazing work guys!
@u850159yeung ปีที่แล้ว ⁺²
Thanks for sharing. Really enjoy watching the whole video.
@zeweichu550 ปีที่แล้ว ⁺⁷
Why would transformers also have the problem of gradient explosion? I thought for a model of, say, 24 layers, the multiplicative effect is limited. So does the gradient explosion come from one particular neuron get surprisingly huge gradient?
@robertjflynn4206 ปีที่แล้ว
I think it is partially to do with earlier layers receiving larger gradient updates and this ends up causing problems for training stability normformer paper goes into this
@brandomiranda6703 ปีที่แล้ว ⁺⁴
The changes in hyperparms seem random. Isn't it possible to diagnose the issue and change the architecture itself? Or tackling the issue more systematically/less randomly?
This is not a criticism -- is is hard but was curious.
@carsonwang2283 ปีที่แล้ว
Thanks for sharing! This is a great exercise. Was Ray used in OPT-175B training like chatGPT did? It will be good to take advantage of the flexible scheduling, scalability and reliability provided by Ray.
@RyanZJC ปีที่แล้ว
Could you share the slides of this talk?
@senx8758 11 หลายเดือนก่อน
can anyone shed more light on activation norm? Susan said it is last layer activation value for sofmax
@senx8758 11 หลายเดือนก่อน
from metareq repo, it is last decoder layer output. between last decoder layer and last softmax, there is a linear h-->V projection.
@扣脚晒太阳 ปีที่แล้ว
质量很高
@brandomiranda6703 ปีที่แล้ว
Is it true opt 175b doesn't display emergence? Only closed models do?
@nowithinkyouknowyourewrong8675 ปีที่แล้ว ⁺¹
the same team just released llama, which certainly does
@BoominGame 10 หลายเดือนก่อน
The stack is a cluster-fuck, pun intended.
@developer-uh9dh ปีที่แล้ว
That is open source
@karanbirchahal3268 ปีที่แล้ว ⁺¹
I dont know why people want to work on this stuff. Very alchemic
@brandomiranda6703 ปีที่แล้ว
what is ppl?
@robertjflynn4206 ปีที่แล้ว
perplexity
@brandomiranda6703 ปีที่แล้ว
More data bloat 16 secrets. Why the latter?

ต่อไป

เล่นอัตโนมัติ

Common Sense as Dark Matter - Yejin Choi | Stanford MLSys #78

Common Sense as Dark Matter - Yejin Choi | Stanford MLSys #78

OPT-175B: LLM Development Lifecycle & Challenges | Susan Zhang

OPT-175B: LLM Development Lifecycle & Challenges | Susan Zhang

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

🔴 LIVE : ด่วน!DSIลุยจับ "สามารถ" จนมุมเชียงใหม่ | DAILYNEWS TODAY 25/11/67

🔴 LIVE : ด่วน!DSIลุยจับ "สามารถ" จนมุมเชียงใหม่ | DAILYNEWS TODAY 25/11/67

ถูกจัดฉากสร้างเรื่อง ตกหลุมพรางมารศาสนา! | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

ถูกจัดฉากสร้างเรื่อง ตกหลุมพรางมารศาสนา! | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

บุกจับ"สามารถ-แม่" คดีฟอกเงิน โยงดิไอคอน | ข่าวเที่ยงช่องวัน | สำนักข่าววันนิวส์

บุกจับ"สามารถ-แม่" คดีฟอกเงิน โยงดิไอคอน | ข่าวเที่ยงช่องวัน | สำนักข่าววันนิวส์

ทายอายุพี่ไฮซ์แลก1000บัค#wkc #shorts #แจกโรบัค #funny #คริปตลกๆ

ทายอายุพี่ไฮซ์แลก1000บัค#wkc #shorts #แจกโรบัค #funny #คริปตลกๆ

Training Billions of Parameter LLMs with MosaicML

Training Billions of Parameter LLMs with MosaicML

AWS Certified Cloud Practitioner Training 2020 - Full Course

AWS Certified Cloud Practitioner Training 2020 - Full Course

The Groundbreaking Cancer Expert: (New Research) "This Common Food Is Making Cancer Worse!"

The Groundbreaking Cancer Expert: (New Research) "This Common Food Is Making Cancer Worse!"

Tim Ferriss: How to Learn Better & Create Your Best Future | Huberman Lab Podcast

Tim Ferriss: How to Learn Better & Create Your Best Future | Huberman Lab Podcast

Compression for AGI - Jack Rae | Stanford MLSys #76

Compression for AGI - Jack Rae | Stanford MLSys #76

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Dr. Terry Sejnowski: How to Improve at Learning Using Neuroscience & AI

Dr. Terry Sejnowski: How to Improve at Learning Using Neuroscience & AI

Top Minds in AI Explain What’s Coming After GPT-4o | EP #130

Top Minds in AI Explain What’s Coming After GPT-4o | EP #130

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

I Exposed The World’s Most DANGEROUS Theme Parks!

I Exposed The World’s Most DANGEROUS Theme Parks!

#快成长计划 #年轻影画创作之星办法总比困难多，还是儿子有办法。#乡村幽默#家庭趣事#搞笑创作#农村风情#幽默生活

#快成长计划 #年轻影画创作之星办法总比困难多，还是儿子有办法。#乡村幽默#家庭趣事#搞笑创作#农村风情#幽默生活

BABYMONSTER (베이비몬스터) - DRIP | Show! MusicCore | MBC241123방송

BABYMONSTER (베이비몬스터) - DRIP | Show! MusicCore | MBC241123방송

ฟังสดเดอะโกสเรดิโอ 24/11/2567 เรื่องเล่าผีเดอะโกส

ฟังสดเดอะโกสเรดิโอ 24/11/2567 เรื่องเล่าผีเดอะโกส

🔴Live : เกาะติดนับคะแนนเลือกตั้งนายก อบจ.อุดรธานี "เพื่อไทย VS ประชาชน" : Matichon TV

🔴Live : เกาะติดนับคะแนนเลือกตั้งนายก อบจ.อุดรธานี "เพื่อไทย VS ประชาชน" : Matichon TV

"เบิ้ล ปทุมราช" งมหอย จับปลา ทำอาหาร เฮ็ดเองเบิ่ด | เฮ็ดอย่างเซียนหรั่ง FULL EP.19 | One Playground

"เบิ้ล ปทุมราช" งมหอย จับปลา ทำอาหาร เฮ็ดเองเบิ่ด | เฮ็ดอย่างเซียนหรั่ง FULL EP.19 | One Playground

Queen vs Doll Squid Game: Who’s Hulk’s Real Love? | Roblox 3D

Queen vs Doll Squid Game: Who’s Hulk’s Real Love? | Roblox 3D

LIVE⚽หลังเกม อิปสวิช vs แมนฯ ยูไนเต็ด l ซอคเกอร์ ปาร์ตี้ ขยี้บอลสด l 2024/25 EP12 l SIAMSPORT

LIVE⚽หลังเกม อิปสวิช vs แมนฯ ยูไนเต็ด l ซอคเกอร์ ปาร์ตี้ ขยี้บอลสด l 2024/25 EP12 l SIAMSPORT