LPUs, NVIDIA Competition, Insane Inference Speeds, Going Viral (Interview with Lead Groq Engineers)

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 มิ.ย. 2024
  • This is an interview with Andrew Ling (VP, Compiler Software) and Igor Arsovski (Chief Architect and Fellow) from Groq. We cover topics ranging from the founding story to chip design and manufacturing and so much more. Plus, they reveal how Groq's insane inference speed can generate much better quality from existing models!
    Check out Groq for Free: www.groq.com
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? ✅
    forwardfuture.ai/
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Rent a GPU (MassedCompute) 🚀
    bit.ly/matthew-berman-youtube
    USE CODE "MatthewBerman" for 50% discount
    Media/Sponsorship Inquiries 📈
    bit.ly/44TC45V
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 273

  • @Batmancontingencyplans
    @Batmancontingencyplans 2 หลายเดือนก่อน +72

    Matt is flying high, kudos buddy for landing this interview!

  • @MrLargonaut
    @MrLargonaut 2 หลายเดือนก่อน +75

    Grats on landing the interview!

    • @jakeparker918
      @jakeparker918 2 หลายเดือนก่อน +4

      Here here 🎉

    • @matthew_berman
      @matthew_berman  2 หลายเดือนก่อน +4

      thank you!!

    • @southcoastinventors6583
      @southcoastinventors6583 2 หลายเดือนก่อน

      Who knew one Heinlein book spawned so many AI companies so far no megachurches as far as I know.

  • @maslaxali8826
    @maslaxali8826 2 หลายเดือนก่อน +22

    I was not expecting this.. Wow bro, natural interviewer

  • @andresprieto6554
    @andresprieto6554 2 หลายเดือนก่อน +10

    I am only 11 minutes in, but i love how passionate and knowledgeable Igor is about his industry.

    • @IGame4Fun2
      @IGame4Fun2 2 หลายเดือนก่อน

      He as fast as grog, saying "yea, yea good.." before question is finished 😂

  • @kamelirzouni4730
    @kamelirzouni4730 2 หลายเดือนก่อน +25

    Matt, thank you so much for the interview. You addressed many questions I was eager to understand. The point that truly astounded me was how inference affects model behavior, significantly enhancing response quality. This is a game-changer. Groq has managed to combine speed and quality. I'm eager for it to become widely available and to have the opportunity to run it locally.

  • @alelondon23
    @alelondon23 2 หลายเดือนก่อน +10

    well done, Matthew! great interview.
    This guys at Groq are crushing it! Great attitude, OOTB thinking, hard work, letting their delivery speak for itself. A very refreshing alternative to the typical over-hyped promises of vaporware. Thank you, Groq!

  • @GuidedBreathing
    @GuidedBreathing 2 หลายเดือนก่อน +10

    15:40 Holy grail of automated vectorizing compiler, threading multi core synchronization.. peak performance, kick the compiler to the side under the hood for finance applications.. Great interview thus far☺️ the repeating loop for reasoning is on hardware on the groq chip; yep that makes things allot faster and very exciting, to repeat itself for the reasoning👏👏👏good job

  • @Alice8000
    @Alice8000 2 หลายเดือนก่อน +9

    Nice work Groq boys!

  • @aiAlchemyy
    @aiAlchemyy 2 หลายเดือนก่อน +12

    Thats Some amazing Valuable content

  • @74Gee
    @74Gee 2 หลายเดือนก่อน +9

    When asked about running locally on a cellphone they skillfully avoided the fact that you need a rack of chips for inference - although working as an integrated system, the 500+ tokens per second come from around 500+ chips.

    • @ritteradam
      @ritteradam 2 หลายเดือนก่อน

      Actually Igor answered honestly before Andrew took over: SRAM is much bigger than DRAM, so it’s not a good idea for LLMs.

    • @74Gee
      @74Gee 2 หลายเดือนก่อน +1

      @@ritteradamThere are pros and cons. SRAM uses less power and produces less heat - so it's a good fit.
      The simple honest answer is you need hundreds of groq chips so it's not viable for personal computing. But that would be a hype-killer wouldn't it.

    • @MDougiamas
      @MDougiamas 2 หลายเดือนก่อน

      Well but remember what they have is on 14nm … new chips are being designed for 2nm … Groq 3 might be vastly more portable and powerful

  • @joe_limon
    @joe_limon 2 หลายเดือนก่อน +13

    This is the single greatest interview I have seen this year

    • @matthew_berman
      @matthew_berman  2 หลายเดือนก่อน +1

      thank you joe!

    • @joe_limon
      @joe_limon 2 หลายเดือนก่อน +1

      @@matthew_berman I think the ai they described at the end could finally reliably answer the how many words in your next response question.

  • @torarinvik4920
    @torarinvik4920 2 หลายเดือนก่อน +9

    Awesome, please do more of these "expert interviews" if you can :D

  • @aaronpitters
    @aaronpitters 2 หลายเดือนก่อน +5

    Great interview! So the innovation to create a simpler design and faster chip came because they didn't have the money to hire people to create a traditional chip. Love that!

    • @albeit1
      @albeit1 2 หลายเดือนก่อน

      Constraints force people to innovate. The obstacle is often the way.

  • @SirajFlorida
    @SirajFlorida 2 หลายเดือนก่อน +4

    Wow, great job on this interview. I've been really excited about Groq. Thumb clicked. LoL

  • @PLACEBOBECALP
    @PLACEBOBECALP 2 หลายเดือนก่อน +8

    I think Matt was having the best day of his life talking to these 2 guys, I don't think that smile left Matt's face for the entire interview. Great interview, about time someone asked some questions that matter, instead of the parroted repetition of When will this and that be ready is it AGI, will robots call me nasty names behind my back?

    • @matthew_berman
      @matthew_berman  2 หลายเดือนก่อน +5

      Lol. Indeed I was having a blast!!

    • @PLACEBOBECALP
      @PLACEBOBECALP 2 หลายเดือนก่อน +2

      @@matthew_berman Ha ha Me too man... well until my jaw hit the floor when he described the architecture of the chip at the smallest scale, 10,000 transistors fit in a single blood cell and they need to use Extreme Ultra Violet light... it truly blew my mind. Do you know if Moore's law allows for an additional reduction in scale, or is 4nm the limit, if this is the case, i assume the technology to build chips atom by atom must have been going on in the background for years in preparation for this long understood inevitability??

    • @Maelzelmusic
      @Maelzelmusic 2 หลายเดือนก่อน

      To my understanding, you can go smaller. Up to 2 or 3 nm but there’s a point where the size gets so small that you enter into quantum realm wave/particle nature and then you get other problems, mainly related to cooling and interpretability of results. Im just going by memory here but you can research further in perplexity or other types of search. It’s a very interesting topic. PS Marques Brownlee/MKBHD has a great video on quantum computers actually.
      Cheers.

  • @NahFam13
    @NahFam13 2 หลายเดือนก่อน +2

    THIS IS THE CONTENT I WANTED TO SEE!!
    Dude I literally complained about a video you made and you have NO idea how happy it makes me to see you doing this interview and asking the types of questions I would ask.

  • @justinIrv1
    @justinIrv1 2 หลายเดือนก่อน +3

    Incredible interview! Thank you all.

  • @adtiamzon3663
    @adtiamzon3663 2 หลายเดือนก่อน +2

    😍 Matt, sooo interesting and informative interview with the lead Groq engineers, Igor and Andrew! Easy to comprehend their presentation, indeed. Thank you, guys. Keep simplifying and relatable. Keep innovating. 🌞👏👏👍💐💞🕊

  • @koen.mortier_fitchen
    @koen.mortier_fitchen 2 หลายเดือนก่อน +1

    So cool this interview. I follow the Matt’s for all my Ai news: Matt Wolfe, Mattvidpro and Matthew 👌

  • @AIApplications-lg1ud
    @AIApplications-lg1ud หลายเดือนก่อน

    Thank you! Awesome conversation! The idea that the Groq architecture would also yield better LLM answers and less hallucination is revolutionary.

  • @nicolashuray1356
    @nicolashuray1356 2 หลายเดือนก่อน +1

    Just wow ! Thanks Matt, Andrew and Igor for that incredible interview about Groq architecture. I'm just fascinated about the beauty of that design and all the uses cases it gonna unlock !

  • @TheJohnTyra
    @TheJohnTyra 2 หลายเดือนก่อน +4

    This is fantastic Matt!! 🎉 Really enjoyed the technical deep dive on this hardware architecture. 🤓💯

  • @albeit1
    @albeit1 2 หลายเดือนก่อน +2

    The traffic scheduling analogy is interesting. Each vehicle in every moment occupies a particular space. And no other vehicle can occupy it. If you can schedule all of them and every pedestrian, you can maximize throughput.
    That also reminds of one reason service oriented architecture work. Small web requests and small vehicles both get out of the way a lot faster. Two herds of mopeds crossing paths can do that a lot faster than two trains.

  • @gynthos6368
    @gynthos6368 2 หลายเดือนก่อน +26

    I just realised, you look like Jon from Garfield

    • @RX-8GT
      @RX-8GT 2 หลายเดือนก่อน +2

      lol for real

    • @zallen05
      @zallen05 2 หลายเดือนก่อน +2

      GOAT comment

    • @howardelton6273
      @howardelton6273 2 หลายเดือนก่อน

      I can't unthink that now haha

  • @planetchubby
    @planetchubby 2 หลายเดือนก่อน +4

    this interview is awesome, really cool

  • @gkennedy_aiforsocialbenefit
    @gkennedy_aiforsocialbenefit 2 หลายเดือนก่อน +1

    Truly incredible interview! wow! Andrew and Igor are brilliant, cool and humble...Just like you Matt. So refreshing. Really excited about the last question and answer concerning Agents. Deeply grateful to you and happy for you Matt. Have been following every video of yours from the onset.

  • @howardelton6273
    @howardelton6273 2 หลายเดือนก่อน +1

    Awesome interviewer achievement unlocked. This is a great format.

  • @jessicas-discoveries-age-6-12
    @jessicas-discoveries-age-6-12 2 หลายเดือนก่อน +1

    Great interview Matt really inciteful. Being able to talk to LLM's in real time will actually make it feel we are that much closer to AGI even if there is still work to do to make it happen in reality.

  • @autohmae
    @autohmae 2 หลายเดือนก่อน

    Thanks for this interview ! Great to see you were able to get this interview. You can be proud. And even if their are things you don't know, this is often still very useful, asking simple questions, because it will let them think and speak instead of answering short questions.
    Regardless if they are a big deal or not, it helped me better understand the inefficiencies in the existing systems. Their might be many questions I would have asked that Matt wouldn't know where to start. Especially if I had time to think about them... but these more surface level questions are very useful. Because I knew parts were hand written/tuned and knew their is even a big research area just in networking things together, but not really got the big picture. Removing inefficiencies is a huge deal, removing a whole bunch of them at multiple levels is a game changer.
    Also shows, if an important part of CUDA is hand-written and it took so many man hours to make by really smart people, than it will mean AMD can't catch up as easily as many would like to see (their reasoning is: competition is good).

  • @kongchan437
    @kongchan437 2 หลายเดือนก่อน +1

    Great to hear more tech pioneers from U of T starting with Dr.Hinton himself. I remember our big Lisp manual was not like commercially published text book so maybe made by U of T researchers ? l remember seeing some very long Lisp program and wondered which grad student had that highly abstract recursive thinking ability.

  • @nuclear_AI
    @nuclear_AI 2 หลายเดือนก่อน +2

    In the context of computing and chips, when folks talk about a 7 nm (nanometer) process or a 5 nm (nanometer) process, they’re referring to the size of the smallest feature that can be created on a chip. Smaller nanometer processes mean more transistors can be packed into the same space, leading to more powerful and efficient chips.
    I hope this helps visualize how incredibly small a nanometer is and the scale at which modern technology operates. It’s like a magical journey from the world we see down to the realm of atoms and molecules, all packed into the tiny silicon chips powering the gadgets we use every day!👇👇👇
    Imagine you have a meter stick. It’s about as long as a guitar or a bit taller than a large bottle of soda. That's our starting point: one meter.Meter (m) - Our starting point. Picture it as the height of a guitar.Decimeter (dm) - Divide that meter stick into 10 equal parts, and each part is a decimeter. Think of it like the length of a large notebook or a bit shorter than the width of your keyboard.Centimeter (cm) - Take one of those decimeters and chop it into 10 smaller pieces. Each piece is now a centimeter, roughly the width of your fingernail or a large paperclip.Millimeter (mm) - If we slice a centimeter into 10 tiny slivers, you get millimeters. That's about the thickness of a credit card or a heavy piece of cardboard.Now, hold onto your hat, because we're about to shrink down into the world of the incredibly tiny:Micrometer (µm) - Dive deeper and slice a millimeter into 1,000 pieces. Each piece is a micrometer, also known as a micron. You can't see these with your eyes alone; it’s about the size of bacteria or a strand of spider silk.Nanometer (nm) - And now, the star of our journey! Cut one of those micrometers into 1,000 even tinier pieces. These are nanometers. A nanometer is so small that it’s used to measure atoms, molecules, and the tiny features on computer chips that you mentioned. To put it in perspective, a human hair is about 80,000 to 100,000 nanometers wide. So, we’re talking seriously small scales here.

  • @jimg8296
    @jimg8296 หลายเดือนก่อน

    Fantastic interview. Learned so much. Thank you.

  • @Maelzelmusic
    @Maelzelmusic 2 หลายเดือนก่อน

    Lovely video, Matt. Huge props for your evolution :).

  • @JMeyer-qj1pv
    @JMeyer-qj1pv 2 หลายเดือนก่อน +6

    Nvidia announced that their upcoming Blackwell chip improves inference speed by 30x. I wonder if that will bring it close to Groq's inference speed or if Groq will still be faster. I'm also curious why the Groq architecture doesn't work for training LLMs.

    • @PaulStanish
      @PaulStanish 2 หลายเดือนก่อน +1

      To the best of my knowledge, the memory doesn't need to change as much for backpropagation so they don't need to be as conservative with timing assumptions etc.

    • @seanyiu
      @seanyiu 2 หลายเดือนก่อน

      The cost for GPU will always be much higher regardless of performance

  • @seancriggs
    @seancriggs 2 หลายเดือนก่อน +1

    Outstanding content, Matt!
    Very well managed and explained.
    Thank you for doning this!

  • @JoseP-cw3je
    @JoseP-cw3je 2 หลายเดือนก่อน +10

    To run llama 70b unquantize with Groq cards of 230MB, you'd need a staggering 1,246 of them at $20K each - that's $25 million total. Their crazy 80TB/s bandwidth would let you run the entire model stupidly fast on this setup. But good luck with the 249kW power draw! For comparison a H100 for that same $25M, you get 833 units at $30K per GPU. Each H100 has "only" 80GB VRAM, so the 280GB model would need to be split across 3-4 GPUs. But with 833 GPUs, you could run around 238 instances insteadof just 1 with Groq. The H100 rig would still chug 583kW, so even if Groq cards can be 80x the speed of a H100 is still 3x behind the H100 in price per performance so to be competitive they would need to be close to 7k.

    • @diga4696
      @diga4696 2 หลายเดือนก่อน +1

      I would say close to 5k, Blackwell with its dgx stack is a ready to rack solution which will offer even better price per performance, and working with a familiar stack is huge for bigger clients

    • @dewardsteward6818
      @dewardsteward6818 2 หลายเดือนก่อน +2

      Please provide a legitimate source for the $20k. The mouser thing people point at is a joke.

    • @actepukc
      @actepukc 2 หลายเดือนก่อน +1

      Haha, this breakdown does make you wonder what other burning questions Matt couldn't ask during the interview. Maybe Groq's pricing strategy will be revealed in the sequel, just like he hinted at follow-up questions?

  • @charlestheodorezerner2365
    @charlestheodorezerner2365 2 หลายเดือนก่อน

    Love your content. Thank you for all you do. And I love Groq. This was a really fresh look into an area (namely, the inner workings of hardware) that is rarely covered. So this was great.
    One insane benefit to Groq that I wished you had asked about: energy consumption. I gather that Groq chips are not only vastly faster, they are also vastly more energy efficient-which is insane when you think about it. Typically, energy consumption increases significantly with increases in speed. (Compare a 4090 to a 4060). Not Groq. It’s blazingly fast while using a small fraction of the energy of a traditional GPU. This is a HUGE deal to me-not only because it decreases the cost of inference, but for environmental reasons. When you scale up the compute necessary to power the world’s inference needs, the energy impact is scary. I wouldn’t be surprised if AI inference becomes a greater source of green house gas emissions than automobile use in a few years. And if I understand it correctly, Groq chips are massively more ecologically friendly. Ultimately, that should be as big a deal as the speed itself. Would love to understsnd better why they are so much more efficient….

  • @kumargaurav2170
    @kumargaurav2170 2 หลายเดือนก่อน

    Till date best video for providing insights about LPUs beyond just their faster inference speed. You should conduct more such videos as it unlocks so much of behind the scenes for Normal ppl. Outstanding video & Outstanding company groq 🙏🏻🙏🏻

  • @cablackmon
    @cablackmon 2 หลายเดือนก่อน

    This SUPER interesting and enlightening. Especially the part about how inference speed can affect the actual quality of the output. Thank you! Keep it up Matt!

  • @markwaller650
    @markwaller650 2 หลายเดือนก่อน

    Amazing interview and insights. Really interesting - how you asked the questions to make this accessible to us. Thank you all!

  • @user-eo1vg6oc3v
    @user-eo1vg6oc3v 2 หลายเดือนก่อน +1

    An interesting combo of ideas presented one was using Claude 3 Opus to train the much smaller Claude 3 Haiku which makes it quicker by being smaller and prompt in step by step. Then it was suggested that adding quiet star to rethink before answering could make the answers 10-50% more accurate. This architecture on groq seems to simplify the traffic flow with ‘one way’ timed traffic. The final suggestion about reiterating the question could be solved by adding Quiet Star which automates that by directing a review of the whole process overall before answering which gave 10-50% more accuracy especially for math or code. So when will this be usable for the general public? -a Groc cloud app?

  • @vinaynk
    @vinaynk หลายเดือนก่อน

    Very informative. This thing will be the heart of skynet :)

  • @RikHeijmen
    @RikHeijmen 2 หลายเดือนก่อน +2

    Matt! Wow! Didz you find out more about the last thing they talked about? About feeding the answer multiple times and asking questions in a slightly different way? It seems like a new way of using the groq chat rather than a new model, right?

    • @unom8
      @unom8 2 หลายเดือนก่อน

      It sounds like energy based modelling, no?

  • @instiinct_defi
    @instiinct_defi 2 หลายเดือนก่อน +2

    Amazing, This content is greatly appreciated!🔥🔥

  • @bladestarX
    @bladestarX 2 หลายเดือนก่อน +1

    Great interview, Matt; you are the best. I think Groq helped create awareness about the benefits of designing and optimizing a chip for inference. However, wasn't this already known by leading companies like NVIDIA? GPUs just happened to be the most appropriate existing architecture that works best for AI training and inference. Remember, prior to ChatGPT, it was all about AI classification and training. Inference was just not a thing. I am not sure if it wasn't because of the focus on the need for inference; something like an LPU would simply not be justified for mass production. So, the reason why the big players don't have LPUs is simply because the demand for these LPUs was not there before ChatGPT woke up the world about LLMs. LPUs actually have a simpler architecture and fewer components than a general-purpose GPU. I believe Groq will benefit from being the first, but it will be very difficult to defend or keep up with the larger chip manufacturers as they have the infrastructure to create LPUs that will probably perform 10x faster than Groq's 14nm.

    • @GavinS363
      @GavinS363 2 หลายเดือนก่อน +2

      This comment doesn't make any sense, what infrastructure is it that you speak of Navita having that gives them a huge advantage in designing chips? I think you are mistakenly believing these companies such as grok and Nivita are not only designing these chips but manufacturing them as well, this is incorrect.
      The only company who both designs and manufacturers silicone is Intel, the rest all only design and then subcontract out Fabs. Usually it's TSCM, who only builds chips to spec and does not design themselves. That's how it is now and how it will remain in the foreseeable future. Trying to build a Fab without having access to Nation level money is basically impossible at this point.

    • @bladestarX
      @bladestarX 2 หลายเดือนก่อน

      @@GavinS363 Everyone knows NVIDIA itself does not operate fabrication plants (fabs) for chip production but outsources the manufacturing to third-party foundries like TSMC and Samsung. they focus on design and development don’t they have facilities for research and development, testing, and other purposes related to their products and technologies? You don’t consider these critical infrastructure? How about their 30,000 employees including their scientists, engineers and architects. Do you think they can give them an advantage when designing LPUs? Not sure why you thought I was explicitly talking about fabs, specially on a video about chip design and architecture. Maybe I should have said chip producer instead of manufacturer?

    • @user-ey6fd9im8o
      @user-ey6fd9im8o 2 หลายเดือนก่อน

      @@GavinS363 Make sure your spelling is correct first. NVIDIA not Nivita and TSMC not TSCM. >

  • @kingrara5758
    @kingrara5758 2 หลายเดือนก่อน

    great interview, so interesting. Loved seeing everyone's enthusiasm. Your videos are my favourite source of AI news. big thank you.

  • @ZeroIQ2
    @ZeroIQ2 2 หลายเดือนก่อน

    That was a great interview, so much interesting information, good job Matthew!

  • @BradleyKieser
    @BradleyKieser 2 หลายเดือนก่อน

    Absolutely the best interview ever! WOW!

  • @darwinboor1300
    @darwinboor1300 2 หลายเดือนก่อน

    Thanks gentlemen,
    The comparison seems to be between a momentum bound industry locked to existing architectures and looking for better ways to play musical chairs with their data and a startup (Groq) practicing first principles to produce a new hardware model suited to the task at hand that moves data and results through memory and compute in multiple parallel queues.
    I look forward to seeing more from Groq.

  • @AlexanderBukh
    @AlexanderBukh 2 หลายเดือนก่อน +2

    well spoken, aaight

  • @semeandovidaorg
    @semeandovidaorg 2 หลายเดือนก่อน

    Great interview!!! Thank you!

  • @nicknick6464
    @nicknick6464 2 หลายเดือนก่อน +1

    Thanks for the great interview. I have a question. Since their chip is quite old (14nm), they must be thinking about an updated version based on 5nm or below. When it will be available in the future and how much faster it will be ?

  • @elyakimlev
    @elyakimlev 2 หลายเดือนก่อน +1

    Good interview. I just wish you hadn't mentioned phones. I really wanted to know if they could create a GPU size hardware for PC, that would outperform RTX 3090 at inference, while being able to run bigger models than the RTX can.

  • @rikhoffbauer
    @rikhoffbauer 2 หลายเดือนก่อน

    This is great! More like this! Very interesting and insightful

  • @Raskoll
    @Raskoll 2 หลายเดือนก่อน +1

    These guys are actual geniuses

  • @coulterjb22
    @coulterjb22 2 หลายเดือนก่อน

    Great interview. I would have loved to hear how they are working on lowering manufacturing costs and when that might happen. My very limited knowledge is these chips are more expensive to make.

  • @glennm7086
    @glennm7086 2 หลายเดือนก่อน

    Perfect level of detail. I wanted an LPU primer.

  • @831Miranda
    @831Miranda 2 หลายเดือนก่อน

    Great interview! Very accessible info! 🎉❤

  • @RonLWilson
    @RonLWilson 2 หลายเดือนก่อน +1

    Interesting!
    BTW, I spent my career with asynchronous software and synchronous software was a big no no in that it too rigidly coupled and we needed to handle sloppy data flows over a distributed architecture..
    That said we did write some of the drivers in hand written assembly language that was synchronous for the drivers where we needed the speed.

  • @scotlandcorpnaics2385
    @scotlandcorpnaics2385 หลายเดือนก่อน

    Outstanding discussion!

  • @swamihuman9395
    @swamihuman9395 2 หลายเดือนก่อน

    - Fascinating.
    - Thx.

  • @scott701230
    @scott701230 2 หลายเดือนก่อน +1

    Grog chip sounds amazing.

  • @JariVasell
    @JariVasell 2 หลายเดือนก่อน +2

    Great interview! 🎉

  • @jonniedarko
    @jonniedarko 2 หลายเดือนก่อน

    by Far my most favorite video you have done! ❤

  • @manishpugalia8559
    @manishpugalia8559 2 หลายเดือนก่อน

    Too good very good learning. Kudos

  • @goodtothinkwith
    @goodtothinkwith 2 หลายเดือนก่อน +1

    Great job Matt! It sounded like it would scale, but might be limited by the die size in the fab..? Is there a limit to how many chips can be chained together like one big chip? I.E., can many Groqs compete with Cerebras’ massive chips? When can we get an agent-based Llama 2 (or 3!) that had this kind of reflexive thinking that Andrew mentioned at the end? Good stuff!

    • @goodtothinkwith
      @goodtothinkwith 2 หลายเดือนก่อน +1

      Maybe even more provocatively, if a bunch of Groqs were chained together to be the size of Cerebras’ chips, just how large of a LLM could it run?

  • @fpgamachine
    @fpgamachine 2 หลายเดือนก่อน

    Very interesting talk, thanks!

  • @seamussmyth2312
    @seamussmyth2312 2 หลายเดือนก่อน +2

    Great interview 🎉

  • @savant_logics
    @savant_logics 2 หลายเดือนก่อน +1

    Thanks! Great interview.👍

  • @AncientSlugThrower
    @AncientSlugThrower 2 หลายเดือนก่อน

    Great interview for a great channel.

  • @NoCodeFilmmaker
    @NoCodeFilmmaker 2 หลายเดือนก่อน +2

    Their API is really competitive too

  • @KitcloudkickerJr
    @KitcloudkickerJr 2 หลายเดือนก่อน +1

    wonderful interview

  • @nvda2damoon
    @nvda2damoon หลายเดือนก่อน

    fantastic interview!

  • @marktrued9497
    @marktrued9497 2 หลายเดือนก่อน

    Great interview!

  • @ArnoldJagt
    @ArnoldJagt 2 หลายเดือนก่อน

    I have such a huge project for groq a soon as it can handle digesting a big chunk of software.

  • @testchannel7896
    @testchannel7896 2 หลายเดือนก่อน +1

    great interview

  • @kostaspramatias320
    @kostaspramatias320 2 หลายเดือนก่อน +1

    Darn, that's gonna be epic!

  • @arturoarturo2570
    @arturoarturo2570 2 หลายเดือนก่อน +1

    Súper instructive

  • @rbdvs67
    @rbdvs67 2 หลายเดือนก่อน

    I wonder what, if any, are the power requirement differences with the Groq architecture? Are they planning on making this on the more current 4-5 nano silicone? Amazing interview and very exciting.

  • @issiewizzie
    @issiewizzie 2 หลายเดือนก่อน

    Great interview

  • @albeit1
    @albeit1 2 หลายเดือนก่อน +1

    Creating hardware specifically designed to serve LLMs reminds me of why vertical integration works. Things get created or optimized to serve the mission. The company doesn’t have to adapt to how existing industries are doing things.

  • @ZychuPL100
    @ZychuPL100 2 หลายเดือนก่อน +2

    This sound like the LPU is a neuron! They basically created a Artificial Neuron that can be connected to other neurons, so this is like artificial brain. Awesome!

    • @executivelifehacks6747
      @executivelifehacks6747 2 หลายเดือนก่อน

      That is the sense I got too. Why is the human brain efficient? Lots of parallel computations, not overly fast. That being said it's not working the whole time at least not all of it AFAIK.

  • @netsi1964
    @netsi1964 2 หลายเดือนก่อน

    ARM originally was created the same way: design instructions first, then the hardware - it was also originally Acorn Risc Machine, as it was to be used inside the Acorn BBC microcomputer.

  • @shyama5612
    @shyama5612 2 หลายเดือนก่อน +1

    Would love a comparison between groq LPU and TPUv5p

  • @ryzikx
    @ryzikx 2 หลายเดือนก่อน +2

    very good fantastic content 🤯🤯

  • @transquantrademarkquantumf8894
    @transquantrademarkquantumf8894 2 หลายเดือนก่อน

    Nice Show

  • @timduck8506
    @timduck8506 2 หลายเดือนก่อน

    This is so out of the box thinking, Yet so logically and basic. (the elusive obvious to a genius answer) 👍👍👍👍👍 When this chip becomes 4nm this is AI on board with zero connection to the outside world.

  • @Artfully83
    @Artfully83 2 หลายเดือนก่อน

    Ty

  • @CalinColdea
    @CalinColdea 2 หลายเดือนก่อน +2

    Thanks

  • @1242elena
    @1242elena 2 หลายเดือนก่อน

    That's awesome 😎

  • @monkeysrightpaw
    @monkeysrightpaw 2 หลายเดือนก่อน

    Are these chips still ok for image processing ai or just language models?

  • @janewairimu5625
    @janewairimu5625 2 หลายเดือนก่อน

    These groups guys need funding in the billion to stop them giving into large corporate bullying..as happening to inflection and stability.
    Their work is so precious..yet tantalising to the big corporations..

  • @jayconne2303
    @jayconne2303 2 หลายเดือนก่อน

    Very nice model of traffic at an intersection.

  • @mithralforge8160
    @mithralforge8160 2 หลายเดือนก่อน

    No mention of the Blackwell announcement this week?

  • @KCM25NJL
    @KCM25NJL 2 หลายเดือนก่อน

    Man, the Groq7B PCI-e accelerator card would be such an easy win..... guess we can keep dreaming :)

  • @jayeifler8812
    @jayeifler8812 2 หลายเดือนก่อน

    Speed is important but so is energy efficiency.

  • @frankjohannessen6383
    @frankjohannessen6383 2 หลายเดือนก่อน

    The fact that their chip is built on 14nm transistors is insane. That's what Nvidia used for the GTX 10-series back in 2017. Imagine how fast Groq would be with 4nm transistors.

  • @impulseCADandDesign
    @impulseCADandDesign 2 หลายเดือนก่อน +2

    Really good interview. I would have dream to have real number of how many cards are need to run Lama 2 or mixtral on such devices now... We only speak about online access but some data shall be run on local model only... So please provide those quick figure depending on the quantization. Thanks in advance. The card are 10 k so these also help to validate the buisness cases... In place of buy lost of nvidia. A100... Consumption too would be apreciated. As well as onboard cache... Because if to run mixtral 8x7b, we need 80 card ... You catch my point latency is important but the owner cheap cost is the real number...

    • @user-eo1vg6oc3v
      @user-eo1vg6oc3v 2 หลายเดือนก่อน

      I think you missed the whole point of ‘one way’ timed traffic. ‘Nanoing’ down can always be done when tech capability is built in the U.S. in the future for further speed.

  • @nhtna4706
    @nhtna4706 2 หลายเดือนก่อน

    Did Igor explain how his mobile phone is able to witness low latency? Like what LLm was used, what audio speech algorithm was being used etc?? I mean an end to end architecture taking this use case??? I may have missed this, pls

  • @skitzobunitostudios7427
    @skitzobunitostudios7427 2 หลายเดือนก่อน

    Matt, are you going to interview 'Cerebras' next? I would like you to maybe get two chaps from each company in a cast with you and have a little 'Shoot Out' of thoughts.