AMA: 1000's of LPUs, 1 AI Brain. Scaling with the Fastest AI Inference
ฝัง
- เผยแพร่เมื่อ 4 พ.ย. 2024
- Learn how the Groq architecture powering its LPU™ Inference Engine is designed to scale from the ground up. This AMA will dive into the scaling capabilities of Groq AI infrastructure across hardware, compiler, and cloud. We'll also discuss the unique Groq approach to overcoming scaling limitations of traditional legacy architectures.
Groq is amazing, the speed is making me speechless. Is it possible to see some samples with a diffusion model soon?
Thank you, great intro to your tech!
Multimodal with voice and image, live video/camera capture would surely be a thing to help advance RnD.
The future of AI is not just LLM it's multimodal, does the LPU works with any type of data that AI can process? (it's tokenized after all)
Are you going to rename it MPU? Multimodal processing unit?
Interesting suggestion, maybe we'll do that with our V2 silicon.
We're already testing multimodal and we have a well published history of doing inference for many types of data heavy workloads. Look at the work done' with national labs, etc.
need to invest on it
Can u do for image and audio gen or video gen also
Groq said they have Lowest TTFT. And it turns out it is 180 ms as shown in their slide. That number really sucks. Even GPU can do it with 100 ms. SambaNova is also doing much better than 180 ms, around 100 ms as well.
Decentralized AGI with virtual substrate independent Machine Learning LLM Nodes working on Multiple servers connected to decentralized search engines being accessed with personal LLM and LAM computers that have WiFi and bluetooth and can learn to operate household appliances and inexpensive interchangeable robot chassis that can be controlled remotely. 😎🤖
Groq is a lier that it says its 1250 tokens/s for llama3 8B is 4x higher than other providers. But they obviously know SambaNova can do 1000+ tokens/s as well.
Well, Lier.
An outlier!
@@BooleanDisorder oh, right, they are liars