InstructPix2Pix (w/ OpenAI's Tim Brooks)

Lucas Beyer (Google DeepMind) - Convergence of Vision & Language

How does Groq LPU work? (w/ Head of Silicon Igor Arsovski!)

สรุป The Icon Group ติดคุกหมด -ขอบสนามSPECIAL

Angry bird PIZZA?

สิ่งประดิษฐ์สุดแปลกจากญี่ปุ่น แบบนี้ดีไหมนะ? #ดรไอซ์

GPT-Fast - blazingly fast inference with PyTorch (w/ Horace He)

Aleksa Gordić - The AI Epiphany

มุมมอง 3 973

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 20 ต.ค. 2024

ความคิดเห็น • 20

@TheAIEpiphany 7 หลายเดือนก่อน ⁺¹
Horace He joined us to walk us through what can one do with native PyTorch when it comes to accelerating inference! Also, if you need some GPUs check out Hyperstack: console.hyperstack.cloud/?Influencers&Aleksa+Gordi%C4%87 who are sponsoring this video! :)
@xl0xl0xl0 7 หลายเดือนก่อน ⁺⁴
Wow, this presentation was excellent. Straight to the point. No over-complicating, no over-simplifying, no trying to sound smart by obscuring simple things. Thank you Horace!
@orrimoch5226 7 หลายเดือนก่อน ⁺¹
Wow! It was very educational and practical!
I liked the graphics in the presentation!
Great job by both of you!
Thanks!
@Cropinky 5 หลายเดือนก่อน ⁺¹
i love this guy so much its unreal
@kaushilkundalia2197 18 วันที่ผ่านมา
It was so informative
@nikossoulounias7036 7 หลายเดือนก่อน
Super interesting talk!! Do u guys have any idea how the compilation-generated decoding kernel compares against custom kernels like Flash-Decoding or Flash-Decoding++?
@xmorse 7 หลายเดือนก่อน ⁺¹
Your questions about why fast-gpt is faster than the cuda version: kernel fusion, merging kernels into one is faster than multiple hand written ones
@SinanAkkoyun 7 หลายเดือนก่อน
How does PPL look at int4 quants? Also, given GPTQ, how high is the tps with gpt-fast?
@xl0xl0xl0 7 หลายเดือนก่อน
One thing that was not super clear to me. Are we loading the next weight matrix (assuming there is enough SRAM), as the previous matmul+activation is being computed?
@Chhillee 7 หลายเดือนก่อน
Within each matmul the loading of data from main memory into registers occurs at the same time as the values being computed.
So the answer to your question is "no, but it also wouldn't help because the previous matmul/activation is already saturating the bandwidth"
@xl0xl0xl0 7 หลายเดือนก่อน
@@Chhillee Thank you, makes sense.
@XartakoNP 7 หลายเดือนก่อน
I didn't understand one of the points made. In a couple of occasions Horace mentions that we are loading all the weights (into the registers I assume) with every token - that's also what the diagram shows at th-cam.com/video/18YupYsH5vY/w-d-xo.html . Is that what's happening? Can the registers load all the model weights at once? If that were the case why do you need to load them every time instead of leaving them untouched. I hope that's a not too stupid of a question.
@Chhillee 7 หลายเดือนก่อน
This is a good question! The big problem is that GPUs do not have enough registers (i.e. SRAM) to load all the model weights at once. A GPU has on the order of megabytes of registers/SRAM, while the weights require 10s of gigabytes to store.
Q: But what if we used hundreds of chips to have enough SRAM to store the entire model? Would generation be much faster then?
A: Yes, and that's what we have with Groq :)
@XartakoNP 7 หลายเดือนก่อน
@@Chhillee Thanks!! I appreciate the answer. I assume the diagram has been simplified for clarity then
@mufgideon 7 หลายเดือนก่อน ⁺¹
Is there any discord for this channel community ?
@TheAIEpiphany 7 หลายเดือนก่อน ⁺²
Yes sir! Pls see vid description
@tljstewart 7 หลายเดือนก่อน
awesome talks, can Triton target TPUs?
@kyryloyemets7022 7 หลายเดือนก่อน
But ctranslate2 as i understand still faster?
@kimchi_taco 7 หลายเดือนก่อน
speculative decoding is major thing, right? If so, not very fair comparison...
@Chhillee 7 หลายเดือนก่อน
None of the results are using speculative decoding except the results we specifically mentioned were using speculative decoding. I.e: we hit ~200 tok/s with int4 without spec-dec, and 225 or so with spec-dec.

ต่อไป

เล่นอัตโนมัติ

InstructPix2Pix (w/ OpenAI's Tim Brooks)

InstructPix2Pix (w/ OpenAI's Tim Brooks)

Lucas Beyer (Google DeepMind) - Convergence of Vision & Language

Lucas Beyer (Google DeepMind) - Convergence of Vision & Language

How does Groq LPU work? (w/ Head of Silicon Igor Arsovski!)

How does Groq LPU work? (w/ Head of Silicon Igor Arsovski!)

สรุป The Icon Group ติดคุกหมด -ขอบสนามSPECIAL

สรุป The Icon Group ติดคุกหมด -ขอบสนามSPECIAL

Angry bird PIZZA?

Angry bird PIZZA?

สิ่งประดิษฐ์สุดแปลกจากญี่ปุ่น แบบนี้ดีไหมนะ? #ดรไอซ์

สิ่งประดิษฐ์สุดแปลกจากญี่ปุ่น แบบนี้ดีไหมนะ? #ดรไอซ์

อ่านยังไง #tamzen #anime #การ์ตูน #การ์ตูน #แต้มเซน #shortvideo #ครู

อ่านยังไง #tamzen #anime #การ์ตูน #การ์ตูน #แต้มเซน #shortvideo #ครู

DeepMind's TacticAI: an AI assistant for football tactics | Petar Veličković

DeepMind's TacticAI: an AI assistant for football tactics | Petar Veličković

Workshop on AI and API | Hackathon - United Group Presents BUET CSE Fest 2024

Workshop on AI and API | Hackathon - United Group Presents BUET CSE Fest 2024

Cursor Team: Future of Programming with AI | Lex Fridman Podcast #447

Cursor Team: Future of Programming with AI | Lex Fridman Podcast #447

Deep Learning Interview Prep Course

Deep Learning Interview Prep Course

Thomas Wolf (HuggingFace) - the case for open-source!

Thomas Wolf (HuggingFace) - the case for open-source!

AI tools for software engineers, but without the hype - with Simon Willison (Co-Creator of Django)

AI tools for software engineers, but without the hype – with Simon Willison (Co-Creator of Django)

Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

Hamel Husain - Building LLM Apps in Production

Hamel Husain - Building LLM Apps in Production

WE GOT ACCESS TO GPT-3! [Epic Special Edition]

WE GOT ACCESS TO GPT-3! [Epic Special Edition]

🔴 ฟุตบอลแชมป์กีฬา 7HD แชมเปียน คัพ 2024 สนาม 2 วันที่ 19 ต.ค. 2567

🔴 ฟุตบอลแชมป์กีฬา 7HD แชมเปียน คัพ 2024 สนาม 2 วันที่ 19 ต.ค. 2567

💗APT. OUT NOW⚡️ #ROSÉ_BRUNO_APT

💗APT. OUT NOW⚡️ #ROSÉ_BRUNO_APT

ทูลกระหม่อมมาหา หมูเด้ง!! l Princess Vlog Ep.135

ทูลกระหม่อมมาหา หมูเด้ง!! l Princess Vlog Ep.135

เขามัทรี - ยูริ โตเกียวมิวสิค 【COVER VERSION】

เขามัทรี - ยูริ โตเกียวมิวสิค 【COVER VERSION】

Human vs Jet Engine

Human vs Jet Engine

Behind The Scene🤌🏻🤣

Behind The Scene🤌🏻🤣

Whose prank is this?#斗罗大陆 #唐舞桐与唐老六 #小舞 #唐舞桐 #唐三 #唐老六

Whose prank is this?#斗罗大陆 #唐舞桐与唐老六 #小舞 #唐舞桐 #唐三 #唐老六