4,000,000,000,000 Transistors, One Giant Chip (Cerebras WSE-3)

แชร์
ฝัง
  • เผยแพร่เมื่อ 10 พ.ค. 2024
  • The only company with a chip as big as your head, Cerebras has a unique value proposition when it comes to AI silicon. Today they are announcing their third generation Wafer Scale Engine, called WSE-3. Built on 5nm, this chip increases the cores to over 900,000, has four trillion transistors, and doubles training performance over WSE-2. Each system costs a few million, but the price hasn't gone up, and these systems are being used globally to overcome the bottlenecks that GPUs can't get rid of.
    [00:00] Sneaky Snek
    [00:19] Moore's Law isn't Dead
    [02:00] Tasty Chip (specifications)
    [03:00] 2x Perf, 2x TCO
    [04:52] 250 ExaFLOPs in one Supercomputer
    [06:00] Eliminate GPU Bottlenecks
    [07:30] The Business Model
    [09:10] Partnership with Inference
    [11:12] Co-designed software
    [14:44] Wafer bite tax
    -----------------------
    Need POTATO merch? There's a chip for that!
    merch.techtechpotato.com
    more-moore.com : Sign up to the More Than Moore Newsletter
    / techtechpotato : Patreon gets you access to the TTP Discord server!
    Follow Ian on Twitter at / iancutress
    Follow TechTechPotato on Twitter at / techtechpotato
    If you're in the market for something from Amazon, please use the following links. TTP may receive a commission if you purchase anything through these links.
    Amazon USA : geni.us/AmazonUS-TTP
    Amazon UK : geni.us/AmazonUK-TTP
    Amazon CAN : geni.us/AmazonCAN-TTP
    Amazon GER : geni.us/AmazonDE-TTP
    Amazon Other : geni.us/TTPAmazonOther
    Ending music: • An Jone - Night Run Away
    -----------------------
    Welcome to the TechTechPotato (c) Dr. Ian Cutress
    Ramblings about things related to Technology from an analyst for More Than Moore
    #cerebras #waferscale3 #mooreslaw
    ------------
    More Than Moore, as with other research and analyst firms, provides or has provided paid research, analysis, advising, or consulting to many high-tech companies in the industry, which may include advertising on TTP. The companies that fall under this banner include AMD, Armari, Baidu, Facebook, IBM, Infineon, Intel, Lattice Semi, Linode, MediaTek, NordPass, ProteanTecs, Qualcomm, SiFive, Supermicro, Tenstorrent, TSMC.
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 510

  • @leonardmilcin7798
    @leonardmilcin7798 หลายเดือนก่อน +294

    This cooking range-sized CPU emits actually 10x more heat than a typical cooking range. That is just crazy.

    • @endeshaw1000
      @endeshaw1000 หลายเดือนก่อน +23

      you can literally heat your house with it, even in deepest winter :)

    • @SaHaRaSquad
      @SaHaRaSquad หลายเดือนก่อน +53

      @@endeshaw1000I too am a fan of house heating that can do computation as side effect

    • @christopherleubner6633
      @christopherleubner6633 หลายเดือนก่อน +8

      It is about the same as a clothes dryer, which is far less than I thought it would need. 24kW isn't too bad, a basic watercooling system with a pre chiller radiator would be fine. The 5V or 3V bussing would be nuts though, the amps would be 8000 at 3v and just under 5000 at 5V. 😮

    • @miemiemiedesu
      @miemiemiedesu หลายเดือนก่อน +6

      Best Device for Training AI Cooking

    • @wawaweewa9159
      @wawaweewa9159 หลายเดือนก่อน +3

      Connect to a floor heating system 😂

  • @shmookins
    @shmookins หลายเดือนก่อน +230

    I'm gonna need more thermal paste.

    • @henrik2117
      @henrik2117 หลายเดือนก่อน +6

      😂👍

    • @nicknorthcutt7680
      @nicknorthcutt7680 หลายเดือนก่อน +3

      😂😂

    • @j.lietka9406
      @j.lietka9406 หลายเดือนก่อน +2

      It should have its own cooling system, like a freezer!

    • @cef-ym3gb
      @cef-ym3gb หลายเดือนก่อน +4

      I hear it's offered in 55 gal drums. 😂

    • @TechTechPotato
      @TechTechPotato  หลายเดือนก่อน +23

      Tubes per chip, rather than chips per tube

  • @kellymoses8566
    @kellymoses8566 หลายเดือนก่อน +316

    Imagine showing this video to someone 30 years ago.

    • @afc8981
      @afc8981 หลายเดือนก่อน +11

      They would probably approve. It's like a giant AI mainframe.

    • @10lauset
      @10lauset หลายเดือนก่อน +8

      Imagine showing this video to someone in China today.

    • @JohnSmith762A11B
      @JohnSmith762A11B หลายเดือนก่อน +27

      Imagine showing it to Alan Turing. It would be like that scene where the archeologists get to Jurassic Park and see actual dinosaurs.

    • @Eugensson
      @Eugensson หลายเดือนก่อน +17

      Imagine showing this video to someone 30 years from now in the future?

    • @fracturedlife1393
      @fracturedlife1393 หลายเดือนก่อน +6

      What someone? John Connor. What future? 1984.

  • @jackdoesengineering2309
    @jackdoesengineering2309 หลายเดือนก่อน +73

    The yield is 100% because if it doesn't work you get a bangin cool frisbee !

    • @carstenraddatz5279
      @carstenraddatz5279 หลายเดือนก่อน +4

      If a manufacturing defect knocks out a single core, you still have 900k minus 1 other cores. The design caters for that.

    • @RubixB0y
      @RubixB0y หลายเดือนก่อน +6

      It's called "catch" because when you don't catch it, the game is over 🙃

    • @richr161
      @richr161 หลายเดือนก่อน

      ​@@carstenraddatz5279if the average yield is 80% your going to have 20% of the chip be dead weight. Don't see the benefit of this design than just breaking the wafer down. You're not worried about size or space requirements at that scale.

    • @carstenraddatz5279
      @carstenraddatz5279 หลายเดือนก่อน

      @@richr161 Worries exist, especially with this type of chip. However at that scale you are very worried if you are TSMC and only get 80% yield. Customers won't come back if you don't improve that. Realistically you're aiming for north of 97% yield or so, I hear.

    • @richr161
      @richr161 29 วันที่ผ่านมา

      @@carstenraddatz5279 Tmsc yield is literally published average at 80% with peaks of greater than 90% on a leading node.
      I'd assume the nodes they keep around for companies who don't use leading edge are in that range, with all optimization going into yield rather than performance.

  • @JanMagnusson72
    @JanMagnusson72 หลายเดือนก่อน +27

    Moore's law is based on the observation that transistor density used to double every 18-24 months. This product does not even use the latest process. If anything it indicates that Moore's law is no longer applicable. Moore's law was never about performance.

    • @wombatillo
      @wombatillo หลายเดือนก่อน +8

      Strictly speaking Moore's law was originally about the actual number of transistors per chip. Originally the TTL and NMOS and whatever chips were 5x5 mm max. so the chip size was fairly limited. The process generation improvements are of course what kept this cycle going until maybe 2012 but after that it's been a combination of increasing the chip size and shrinking the transistors. Moore's law was never thought to apply to one square foot silicon chips.

    • @Poctyk
      @Poctyk หลายเดือนก่อน +8

      @@wombatillo Funny enough in 1975 article(?) Moore actually noted that increase in die size was part of how doubling transisor count was achieved

  • @seeibe
    @seeibe หลายเดือนก่อน +156

    Don't think this was what Moore had in mind when he formulated the law 😅

    • @hrdcpy
      @hrdcpy หลายเดือนก่อน +60

      Correct. He imagined a trillion-dollar company limiting users to 64GB of storage in order to push cloud solutions.

    • @KinoINFINITY
      @KinoINFINITY หลายเดือนก่อน +8

      ​@@hrdcpyGood old apple and its supporter

    • @jackdoesengineering2309
      @jackdoesengineering2309 หลายเดือนก่อน +10

      Bitcoin miners are now selling the heat generated into an industrial process. Datacentres may soon follow suit. They really need a way to recapture the energy costs

    • @SoylentGamer
      @SoylentGamer หลายเดือนก่อน +5

      Our monthly reminder that it was never really a "law" in the scientific sense

    • @dixie_rekd9601
      @dixie_rekd9601 หลายเดือนก่อน +5

      @@SoylentGamer thats why i always called it "Moores lore"

  • @fukushimaisrevelation2817
    @fukushimaisrevelation2817 หลายเดือนก่อน +162

    I need a 900,000 core computer for blackjack and duck hunting, ah forget the duck hunting.

    • @jolness1
      @jolness1 หลายเดือนก่อน +5

      A fellow person of culture I see. Always happy to se a futurama reference.

    • @fukushimaisrevelation2817
      @fukushimaisrevelation2817 หลายเดือนก่อน +4

      @@jolness1all i know is my gut says maybe

    • @Wingnut353
      @Wingnut353 หลายเดือนก่อน +2

      I can hook you up with a Whitebox P133 256Mbits of ram, 28.8k softmodem, and an advanced windows 95 OSR2 operating system. I can throw in a parallel ports scanner and HP B&W letter quality printer if you like also built like a tank.

    • @handlemonium
      @handlemonium หลายเดือนก่อน

      Now we can hunt ducks made of Dark Matter 😏

    • @paulmichaelfreedman8334
      @paulmichaelfreedman8334 หลายเดือนก่อน

      Nah, half a million should do fine.

  • @MeriaDuck
    @MeriaDuck หลายเดือนก่อน +141

    24kW through that PAVER of a 'chip' (the term chip was meant for little pieces of silicon if I recall correctly, we need another name...). That thing needs a proper cooling tower, how does one even route 24kW at low voltage trought all that without it going woosh. That's a feat of engineering proper.

    • @FreeOfFantasy
      @FreeOfFantasy หลายเดือนก่อน +5

      This channel has a video about the Tesla Dojo chip. I'm guessing the power solution is similar.

    • @dnmr
      @dnmr หลายเดือนก่อน +34

      if it's not a chip it's the whole potato

    • @MeriaDuck
      @MeriaDuck หลายเดือนก่อน +6

      @@dnmr my potato brain hadn't made that link yet 🤣🥔

    • @fracturedlife1393
      @fracturedlife1393 หลายเดือนก่อน +9

      It's a SLAB.

    • @handlemonium
      @handlemonium หลายเดือนก่อน +2

      ​@@dnmrbut can it run Cyberpunk 2077......100% path traced?

  • @jolness1
    @jolness1 หลายเดือนก่อน +45

    This is such a cool idea. Never can get over what a wild idea it is to have a die that is a full wafer with the round parts lopped off.

  • @BaBaNaNaBa
    @BaBaNaNaBa หลายเดือนก่อน +97

    bro is holding the holy grail casually in his arms 😱

    • @LemonsRage
      @LemonsRage หลายเดือนก่อน +10

      and literally taking a bite out it

    • @radugrigoras
      @radugrigoras หลายเดือนก่อน +3

      Lol “bro” at the minimum Dr. Bro.

    • @user-cr3pj2nr4e
      @user-cr3pj2nr4e หลายเดือนก่อน +1

      @@radugrigoras, esquire

    • @GuidedBreathing
      @GuidedBreathing หลายเดือนก่อน

      That golden coco bar looking thing is probably more expensive than a normal coco bar looking thing

    • @charleshendry5978
      @charleshendry5978 หลายเดือนก่อน

      A La Monty Python 😂

  • @ProjectPhysX
    @ProjectPhysX หลายเดือนก่อน +43

    If only Cerebras hardware had OpenCL support and wouldn't need an own proprietary language! Would open doors to HPC/simulation workloads way beyond AI.

    • @RahulAhire
      @RahulAhire หลายเดือนก่อน +3

      They do support HPC simulation, right? I do see cerebras SDK supporting scientific computing. I might assume it will need some workaround.

    • @forceofphoenix
      @forceofphoenix หลายเดือนก่อน +1

      OpenCL? Vulkan is the real shit ;-)

  • @brodymiller9299
    @brodymiller9299 หลายเดือนก่อน +24

    I need two of those, that way I can have 1 core for each pixel to get 100,000 fps

  • @eyescreamsandwitch52
    @eyescreamsandwitch52 หลายเดือนก่อน +13

    2:14 His intrusive thoughts won there for a second

    • @DigitalJedi
      @DigitalJedi หลายเดือนก่อน +6

      Ian's just kinda like that sometimes. Gotta have a little nibble from time to time.

  • @jonjohnson2844
    @jonjohnson2844 หลายเดือนก่อน +43

    Peter bites? He’s never bitten me.

    • @honkhonk8009
      @honkhonk8009 หลายเดือนก่อน +6

      Hey Lois, this reminds me of that time I made an AI chip out of a whole wafer. hehehehehe

    • @maxpower1337
      @maxpower1337 หลายเดือนก่อน

      I can store my home movies at last.❤

    • @njpme
      @njpme หลายเดือนก่อน

      😂😂😂

  • @matthewsjc1
    @matthewsjc1 หลายเดือนก่อน +7

    I remember at one point in the 90s changing the jumpers on the motherboard to overclock my pentium from 60 mhz to 66 mhz, but never finding a way to cool it enough to remain stable. At the time I would’ve been thrilled to have that 10% jump in performance. My brain may have melted knowing that in 2024 I’d have multiple machines (including portable ones!) that are note only multi-core, but run at BILLIONS of cycles per second.

  • @gensteps923
    @gensteps923 หลายเดือนก่อน +12

    Yesterday when this news broke I looked for a video on It, couldn't find it so I found your vid on WSE-2. Now today you deliver on the news regarding WSE-3. Nice work

  • @eonreeves4324
    @eonreeves4324 หลายเดือนก่อน +7

    it's crazy to think of the amount of work that goes into creating these and then to sell 9 or 10 of them a year. it shows how niche the market is for this kind of processor

  • @gustamanpratama3239
    @gustamanpratama3239 หลายเดือนก่อน +21

    Can't wait to see what kind of performance boost the next gen Wafer-Scale Engine 4 will bring us!!🤤 Imagine that it will be using 2nm Forksheet GAA or 1nm CFET tech

    • @TechTechPotato
      @TechTechPotato  หลายเดือนก่อน +18

      I asked. Was told to wait

    • @Wingnut353
      @Wingnut353 หลายเดือนก่อน +3

      @@TechTechPotato hold yer horses potato man they says!

    • @yancgc5098
      @yancgc5098 หลายเดือนก่อน +2

      Considering they went from 7nm to 5nm for WSE-3, the next logical step will be TSMC 3nm for WSE-4

  • @cem_kaya
    @cem_kaya หลายเดือนก่อน +15

    This is one of the most interesting chips on the market. Happy to hear they are earned more money then they have raised.

  • @jerrywatson1958
    @jerrywatson1958 หลายเดือนก่อน +2

    You do a good job of pointing out the best features of the products you cover. It makes it easier to follow for us non computer scientists. I also like that you mention a products shortcomings with ways to work around them if possible.

  • @simonstrandgaard5503
    @simonstrandgaard5503 หลายเดือนก่อน +7

    Interesting following the progress of these chips. Mindblowing.

  • @XIIchiron78
    @XIIchiron78 หลายเดือนก่อน +3

    Imagine sending an entire 100amp residential service into a single chip

  • @NoSpeechForTheDumb
    @NoSpeechForTheDumb หลายเดือนก่อน +7

    Moore's law is about logic DENSITY. It's not about more logic in a chip the size of a chess board LOL

    • @n00blamer
      @n00blamer หลายเดือนก่อน +2

      "the number of transistors in an integrated circuit (IC) doubles about every two years." -- Gordon Moore

    • @NoSpeechForTheDumb
      @NoSpeechForTheDumb หลายเดือนก่อน +2

      @@n00blamer LOL what you posted is NOT Moore's Law. It's the simplistic theme park version spreaded by the media. The actual law is distributed over his article "Cramming more components onto integrated circuits" from 1965, where he referenced to the complexity in two-mil squares. Please do your own homework LOL

    • @n00blamer
      @n00blamer หลายเดือนก่อน +1

      @@NoSpeechForTheDumb In his original article, Gordon Moore stated the essence of what became known as Moore's Law with the following quote:
      "The complexity for minimum component costs has increased at a rate of roughly a factor of two per year... Certainly over the short term this rate can be expected to continue, if not to increase."
      This statement captures the crux of Moore's Law, highlighting the exponential growth in the number of components (transistors) that can be integrated onto a semiconductor chip at minimal cost, with the expectation that this trend would persist into the foreseeable future.

  • @SimEon-jt3sr
    @SimEon-jt3sr หลายเดือนก่อน

    Amazing rundown thanks man

  • @cryptocsguy9282
    @cryptocsguy9282 หลายเดือนก่อน +16

    I do remember cerebras claiming that their WSE 2 was better than Nvidia's offering at the time but Nvidia seems to have the most hype out of all the companies involved in AI. My understanding is that having all the processing units on one massive wafer just makes everything move faster instead of having many discrete GPUs connected together

    • @peoplez129
      @peoplez129 หลายเดือนก่อน +1

      It uses a lot of power, but the sheer scale of it makes it more efficient. For example, the RTX 4090 tops out at 100tflops. That means it would take 10 of those to have 1 petaflop. This chip does 125x that, so you would need over a thousand RTX 4090's to equal the same processing power. Not to mention the 4090's would require over 400,000 watts of power, while this requires only 25,000. That alone gives it a huge win over Nvidia to the point that if anyone is even still using Nvidia, it's because they're laundering money, because ignoring the savings of 375,000 watts an hour to run it, can't be anything but a money laundering operation at that point.

  • @dakoderii4221
    @dakoderii4221 หลายเดือนก่อน +14

    Since 2020, all the memes and parodies became reality.

  • @techman2553
    @techman2553 หลายเดือนก่อน +5

    Can't wait for the laptop version of the chip !!

  • @AliMoeeny
    @AliMoeeny หลายเดือนก่อน +2

    these numbers are mind blowing.Also one chip with that much memory to train models is lit,

  • @AlexSeesing
    @AlexSeesing หลายเดือนก่อน +3

    No potato for sir.
    With these kind of chips I can't get rid of the feeling I had around 1990. 80386 was kinda in our grasp but yet, RISC told us: Nope you won't. This feels kinda the same again.

  • @andychow5509
    @andychow5509 หลายเดือนก่อน +1

    Imagine if every cold country used these to heat buildings in winter. You could reduce heating costs to zero, and really get both compute and heat in virtually perfect harmony.

  • @nicknorthcutt7680
    @nicknorthcutt7680 หลายเดือนก่อน +2

    Astonishingly powerful, one hell of a CPU 😳

  • @Karthig1987
    @Karthig1987 หลายเดือนก่อน +2

    Awesome video

  • @SpencerHHO
    @SpencerHHO หลายเดือนก่อน

    That single piece of silicon uses more power than my 200amp and 180 amp welders combined even when maxxed out. In fact it uses more than my entire house does 99% of the time.
    The stitching of rheticals is truly a remarkable innovation and something I want to learn more about.
    If I had to guess, I'd think there'd be a buffer in from the edge of the conventional masks then a second 'stitching' mask would be used to overlap rheticals and mark over them in a manner reminiscent of multi patterning in conventional lithography. Regardless of how it's done it's truly remarkable the level of precision and the fact that they can yield something this big on 5nm is actually insane.
    It seems they've exceeded their own expectations from what they initially set out to achieve. They were initially talking about being a few nodes behind but they're now basically on the leading edge and only one step from the absolute bleeding edge.

  • @cedivad
    @cedivad หลายเดือนก่อน +5

    How do you route 25,000 Amps worth of current to a pizza box? I know some ASIC miners from the ages past got around it by stacking multiple chips/cores together, meaning multiple 0.8V cores were combined to form a single 3V-something processing block, which reduces the current requirements and makes power distribution design easier/reasonable. Anybody knows if they are doing something similar here? I'm too curious.

  • @thomasmurphy3927
    @thomasmurphy3927 หลายเดือนก่อน

    You're doing great bro. Keep it up. It was just yesterday you had a couple of thousand subscribers. Now look at you. 🎉🎉🎉🎉

  • @50shadesofbeige88
    @50shadesofbeige88 หลายเดือนก่อน +4

    Now THATs a big chip.

  • @Void_Glitcher
    @Void_Glitcher หลายเดือนก่อน +2

    one thing I really want to see are smaller AI chips that are for personal/commercial use. I've messed with ai image generation and some other ai stuff but you can't really go any higher than 520x520 image quality with a middle ground GPU. if there are any products already like this please tell me.

  • @switzerland3696
    @switzerland3696 หลายเดือนก่อน

    20 of the card in one chassis, how do you do the PCIe channel routing? 16x? How many CPU's?

  • @xl0xl0xl0
    @xl0xl0xl0 หลายเดือนก่อน +5

    What kind of software / framework do they provide? I take it, it's not PyTorch or JAX? How hard is it actually to implement those models and the training code?

    • @TechTechPotato
      @TechTechPotato  หลายเดือนก่อน +9

      Pytorch and tensor flow iirc. I didnt show the slide, but they stood up gigaGPT in 565 lines of code, vs 20000 for megatronLM. Both 175B parameters

  • @Stadtpark90
    @Stadtpark90 8 วันที่ผ่านมา

    Archeologists in 6000 years: no idea what this did. Maybe an element to heat your food?

  • @mickeygallo6586
    @mickeygallo6586 28 วันที่ผ่านมา

    That's makes one hell of a schematic

  • @whyjay9959
    @whyjay9959 หลายเดือนก่อน +2

    Do they also make chips out of single or a few tiles? Like from outside of the square.
    It's an interesting method, gets one thinking about how else it could be applied, like a CPU getting 2 or 4 still-attached tiles instead of 2 or 4 of the same chiplet. Also, imagine if we were using 450mm wafers; that might not have been a profitable transition for most uses, but for this and silicon interconnect fabrics it would've been different.

    • @Wingnut353
      @Wingnut353 หลายเดือนก่อน

      Its a single wafer... normally chips are made from a wafer just like this and then diced up into smaller chips... the reason chips are normally limited to smaller sizes is the projection system they use to image the chip only covers a small portion the rectangular areas you see on this wafer.... since they are doing all this on teh same wafer though they can put ultra high bandwidth links between the normal rectical scan areas... and link it all together. there is far more bandwidth available here than you would normally get even through an interposer since all the layers are there.... instead of it just being one layer through an interposer. Making GPUs like this might actually make sense...that said planar latency on this thing is probably quite "bad", its part of the reason vertically stacked cache on Ryzen x3d has low latency is that going vertical is faster than going sideways twice as far.

  • @Veptis
    @Veptis หลายเดือนก่อน +4

    Wait? Moore's Law was never size limited? So it's not just density alone??
    I am most excited about the Qualcomm Cloud AI100 Ultra card tbh. It seems to be the best solution for workstation/researchers who mainly care about running evals which purely require inference. And 128GB per card... Would take like two A100 to match. And those costs easily 30k+
    Please let Qualcomm know we want them! I am almost ready to pay 10k for a single card... If they can sell it to me, proof the software works and finally release some accurate benchmarks. Like I want to know what a single card can do for throughput with like a 70B model at FP/BF 16
    Can they donate a WSE (1,2 or 3) to Fritz for dieshots?
    Also the door behind you spells MOOR - surely that's on purpose

    • @Poctyk
      @Poctyk หลายเดือนก่อน

      It is/was not about density but the total transistor count.

  • @jeffeast7983
    @jeffeast7983 หลายเดือนก่อน +3

    Yes, it also plays Crysis with max settings.

    • @frankstrawnation
      @frankstrawnation หลายเดือนก่อน

      I had to read a lot of comments to find this joke.

  • @hrdcpy
    @hrdcpy หลายเดือนก่อน +7

    A meeting room called "Cathedral Peak" that is located on the ground floor? 🤔

    • @andredeklerk1069
      @andredeklerk1069 หลายเดือนก่อน +1

      That has me thinking they have a South African around, with the meeting rooms following a famous peaks convention.

  • @Theodorus5
    @Theodorus5 หลายเดือนก่อน

    2:15 I was waiting for him to do that 😄

  • @thaedleinad
    @thaedleinad หลายเดือนก่อน +1

    Imagine building a nuclear space station full of these things like a floating AI god.

  • @gerbil7771
    @gerbil7771 หลายเดือนก่อน

    I can’t comprehend the scale of the capabilities these processors have anymore. It’s absolutely nuts.

  • @the.bog.
    @the.bog. หลายเดือนก่อน +1

    okay but what's the gemm/W, how are you guys solving non-stationary dataflow. inter core communication has an incredible power overhead. not to mention the developer nightmare of having to debug and troubleshoot non-deterministic compilation tools.

  • @lucamatteobarbieri2493
    @lucamatteobarbieri2493 หลายเดือนก่อน

    The specs are amazing. Did Cerebras reduce complexity like groq did?

  • @lostinseganet
    @lostinseganet หลายเดือนก่อน +4

    @3:33 Wow 100% more performance from 7 to 5 nm so there should at least be another 100% boost worth of room from 5 to 3~2nm?

    • @Walczyk
      @Walczyk หลายเดือนก่อน

      that’s not how it works doofus

  • @sunnohh
    @sunnohh หลายเดือนก่อน +1

    I cannot believe you held it that long without eating it 😊

  • @wolftheai
    @wolftheai หลายเดือนก่อน

    Ok with that kind of power can we get a deep dive into the cooling system?

  • @Phantom_Communique
    @Phantom_Communique หลายเดือนก่อน

    I had to double check the zeros in the title. Holy moly.

  • @freedom_aint_free
    @freedom_aint_free หลายเดือนก่อน +1

    One day we will have a solid black monolith of nothing but transistors and memory, like the one in 2001 Space Odyssey !

  • @SharpsBox
    @SharpsBox 20 วันที่ผ่านมา

    Boy, Witcher 3 with RT on will be sweet with this rig!

  • @danburycollins
    @danburycollins หลายเดือนก่อน +7

    Man... how big is the CPU cooler??? Gonna need more thermal grease.

    • @orangejjay
      @orangejjay หลายเดือนก่อน +1

      Thermal grease?! Thermal pads are the way to go these days, my friend. ❤

    • @danburycollins
      @danburycollins หลายเดือนก่อน

      @@orangejjay I mean, that's probably true, but who makes them this large 🤣😜😃

  • @gustamanpratama3239
    @gustamanpratama3239 หลายเดือนก่อน

    I wonder whether or not they could combine WSE-3 with photonics interconnects/ interposer for between-chip communication and fiber optics for data flow between rack units and even between data centers to achieve an even faster system.
    there was this achievement last year by NICT for creating 22.9 petabits per second transmission in a single fiber, although it has 38 cores (or 24.7 Pb/s with better optimized coding), i mean this just demonstrates how fast a fiber and photonics in general can get, and it is just the beginning, this skinny glass can be even faster in the future. If they can incorporate these two (wafer size engines and photonics) maybe we can achieve zettascale in the near future, say five years

  • @incription
    @incription หลายเดือนก่อน +3

    do we even have the data to train a hypothetical 24 trillion parameter model on this?

  • @kellyeye7224
    @kellyeye7224 หลายเดือนก่อน

    I remember my first PC - ETI Magazine DIY computer called the Transam Triton. 8080-based and 256 BYTES of memory! Cost me £300 in 1978.

    • @Bobby-fj8mk
      @Bobby-fj8mk หลายเดือนก่อน

      I still have an SDK 8085 kit.

  • @nothinghere1996
    @nothinghere1996 27 วันที่ผ่านมา

    anamartic and wafer scale memory. happy days.

  • @mirkogeffken2290
    @mirkogeffken2290 หลายเดือนก่อน

    When your chunk is more effective at producing heat than your induction stove.

  • @RedPillRachel
    @RedPillRachel หลายเดือนก่อน +1

    The only thing that this video, and all of your other videos for a while now, is the meme-worthy "What's your minimum specification?" jingle you used to have... is there anybody else missing that or just me?

  • @raffycamulataldamar6645
    @raffycamulataldamar6645 หลายเดือนก่อน

    Insane

  • @Arcticwhir
    @Arcticwhir หลายเดือนก่อน +3

    i really wonder how the software stack compares to nvidia, what does the inference training actually look like

    • @goodfodder
      @goodfodder หลายเดือนก่อน +1

      me too, devil is in the detail.

  • @ernsailor9041
    @ernsailor9041 หลายเดือนก่อน +2

    I might be wrong but are you sure that'll fit in my phone, looks like it might be a tad too big but things can look bigger on the screen so who knows.

  • @xeode
    @xeode หลายเดือนก่อน

    out in the land of the 'on premise'

  • @Idoldissr.11
    @Idoldissr.11 หลายเดือนก่อน +1

    Still... will it break the 60 fps barrier in Skyrim SE/AE? LOL (Just thinking out loud.)

  • @gandalfgreyhame3425
    @gandalfgreyhame3425 หลายเดือนก่อน +5

    Is that chip from a single silicon slice? Or, more likely, a 12 x 7 array of individual chiplets stitched together?

    • @unvergebeneid
      @unvergebeneid หลายเดือนก่อน +10

      It's one piece of silicon (not silicone, that's a polymer). Hence the name wafer-scale.

    • @TheReferrer72
      @TheReferrer72 หลายเดือนก่อน +4

      Wafer Scale implies one piece of silicon.
      It has a lot of engineering to get around defects, plus the cores are tiny.

    • @salmiakki5638
      @salmiakki5638 หลายเดือนก่อน +2

      They claim it is monolithic

    • @gandalfgreyhame3425
      @gandalfgreyhame3425 หลายเดือนก่อน +2

      @@unvergebeneid OK, I corrected the spelling.

    • @gandalfgreyhame3425
      @gandalfgreyhame3425 หลายเดือนก่อน

      @@unvergebeneid The yield rate for such a gigantic single piece of silicon with over a trillion transistors must be really low. I mean I think the yield rates for standard size CPUs are only in the range of 10-20%. The chances for one or more defects to be present on a giant chip that is 84x larger in size must be enormous.

  • @johnpereztwo6059
    @johnpereztwo6059 หลายเดือนก่อน

    in 5 years sitting in desktops . 10 years sitting in tv sets .

  • @talroitberg5913
    @talroitberg5913 หลายเดือนก่อน

    I wonder if these sorts of Wafer Scale Engines can be combined with advanced packaging / memory stacking? To my understanding, large AI models are bottlenecked by memory capacity and throughput, so adding a closely-bundled cache or HBM stack on top could increase performance by a lot.
    That said, with the energy this thing uses, powering and cooling extra memory stacked directly on top might be a problem. Maybe if it has separate power delivery, fluidic cooling channels through and between the chips, etc? There's probably high-end customers who would want that if it gives significant advantages over H100s for their applications.

  • @dan-tv1kp
    @dan-tv1kp หลายเดือนก่อน

    Cool, but what if u gotta send data from one corner to the another/one corner to the center?

  • @GuidedBreathing
    @GuidedBreathing หลายเดือนก่อน

    How do one run this thing ? How many hair dryers of power does this one take? What is the use cases of this one; automated debugging of a large code base I can imagine is done in a snap.. curious of the business use cases

  • @philmarsh7723
    @philmarsh7723 หลายเดือนก่อน

    I wonder how this would perform on electromagnetic FDTD solver such as OpenEMS?

  • @El.Duder-ino
    @El.Duder-ino หลายเดือนก่อน

    One of the kind, very unique solution only from Cerebras. They found a hole in the market otherwise they would be out of business by now and with inference cards and renting model they can also monetize pretty good as well.

    • @catchnkill
      @catchnkill หลายเดือนก่อน

      You get it all wrong. Cerebras' wse-3 chips will be used for training primarily. They are not for inference. They sold it as a whole system as a supercomputer system.

    • @El.Duder-ino
      @El.Duder-ino หลายเดือนก่อน

      @@catchnkill "Greetings to Chinese state hackers!" - u got it wrong and obviously u r not reading that I mentioned "inference cards" mentioning Qualcomm ASICs. Nice trolling though...

  • @kingofstrike1234
    @kingofstrike1234 หลายเดือนก่อน +2

    should other chips manufacturer follow them for their yeild redundancy

  • @Safetytrousers
    @Safetytrousers หลายเดือนก่อน +5

    Have technology, must bite it.

  • @Awave3
    @Awave3 หลายเดือนก่อน

    This is the kind of chip that is going to wake up and become conscious as soon as it is plugged in.

  • @ywtcc
    @ywtcc หลายเดือนก่อน

    In finance, every time you see exponential growth, you know eventually it will level off. (Sometimes the bubble pops, that's not Moore's law, though.)
    It's a mathematical certainty, based on the nature of exponential growth. If your investment grows exponentially faster than the economy, sooner or later the economy's going to start fighting back. That's why those exponential growth curves always level off to new equilibria.
    With Moore's law, I think that inflection point happens when some minimum size is reached for transistors.
    Then, it's system sizes that increase to keep the growth up. However, here we're energy limited. It's a finite planet.
    The thing that's interesting is imagining what that new equilibria state must represent, approaching practical energy limits.

    • @gregorymalchuk272
      @gregorymalchuk272 29 วันที่ผ่านมา

      The actual equilibrium we end up reaching matters a lot to the state of civilization. If integrated circuits had reached equilibrium in the mid 1970s, microcomputers would be hobbyist curiosities. If it was reached in the 1980s, we might have word processing and spreadsheets, but nothing more. If equilibrium was reached in the 1990s, we might have GUIs and a primitive Internet, but nothing more. If equilibrium was reached in the early 2000s, we don't get iPhones. If our current state is close to the equilibrium, we will miss out on general purpose AI, AI drug development and screening, simulated synthetic biology, fast protein folding modeling, longevity/biological immortality breakthroughs, etc.

    • @ywtcc
      @ywtcc 29 วันที่ผ่านมา

      ​@@gregorymalchuk272 It's starting to look like planetary intelligence is taking shape.
      In self preservation.
      The equilibria system size is proportional to the planet that houses it.
      To stretch beyond those boundaries would take not just a revolution in analysis, but also in capacity to use a lot of energy. Perhaps over a long period of time.
      Surely when spreading across planets, moore's law must have characteristics of a staircase function.

  • @iwmaxx
    @iwmaxx หลายเดือนก่อน

    Wonder how it compares to Tesla Dojo, it uses the full wafer scale chip design also.

  • @eightsprites
    @eightsprites หลายเดือนก่อน

    Had to create a motherboard to that one.

  • @movax20h
    @movax20h หลายเดือนก่อน +4

    1/4 ZF. Nice. FP16, but still damn. Nice. I think raw power is one thing, but the biggest advantage is efficiency of data transfer, because most of data and communication is on chip. That itself would save a lot of power.

  • @Gertbfrobe407
    @Gertbfrobe407 หลายเดือนก่อน

    Hi, I'm here from "Tech Linked" 🎉

  • @exidy-yt
    @exidy-yt หลายเดือนก่อน +1

    This must play a mean game of Crysis.
    TBH this gives me hope I may just live long enough to be able to upload my mind to run on a CPU before I die.

  • @GuidedBreathing
    @GuidedBreathing หลายเดือนก่อน

    4 Trillion transistors; what is the optimized use case for this computing chip? If we compare this one to Nvidias solutions; what’s the main differences ? Thanks 🙏

  • @christopherleubner6633
    @christopherleubner6633 หลายเดือนก่อน

    That is not only wafer scale, that looks like it used an entire 13 inch wafer. The fact that they can make a single IC die that big shows how far we have come. I can only imagine the amps required and heat extraction required for a CPU that size while running at full power. 😮😮😮

    • @wombatillo
      @wombatillo หลายเดือนก่อน

      How much does a 5nm process 300mm wafer cost at TSMC these days? $15000? That's a heck of an expensive chip even with no margins added for RDI, marketing, manufacturing outside the fab, distribution, sales, profit, etc.

    • @ironicdivinemandatestan4262
      @ironicdivinemandatestan4262 หลายเดือนก่อน

      ​​@@wombatilloThe WSE chips are sold for around $2 million, so the cost of the wafer is a drop in the bucket.

    • @wombatillo
      @wombatillo หลายเดือนก่อน

      @@ironicdivinemandatestan4262 The chip must really be worth it to have such valuation. The distributed memory and sheer bandwith is insane compared to h100 clusters and others.

  • @nicholash8021
    @nicholash8021 หลายเดือนก่อน

    I can't decide on Lennox or Carrier for the cooling.

  • @petergibson2318
    @petergibson2318 หลายเดือนก่อน

    You could heat a village with that. Cooling it must be a nightmare.

  • @fundiambb
    @fundiambb หลายเดือนก่อน

    does it run ark survival evolved tho?

  • @JorgetePanete
    @JorgetePanete หลายเดือนก่อน

    What are the plans from companies now that model parameters go as low as 1 bit instead of 16 or even 4?

    • @TechTechPotato
      @TechTechPotato  หลายเดือนก่อน +1

      Lots of companies looking at INT4.

  • @tomstech4390
    @tomstech4390 หลายเดือนก่อน

    One of the few times Moores law is used correctly factoring in the cost, I'm not aware of another time it's actually kept true in the last 10 years.

  • @tibbydudeza
    @tibbydudeza หลายเดือนก่อน +1

    Holy smokes - what is the cooling and power requirements ???.

    • @TechTechPotato
      @TechTechPotato  หลายเดือนก่อน +1

      24kW. They sell it as a system, self contained with cooling. Just plug in power and networking.

    • @tibbydudeza
      @tibbydudeza หลายเดือนก่อน

      Who would use such a beast - NSA ???@@TechTechPotato

  • @TiagoTiagoT
    @TiagoTiagoT หลายเดือนก่อน

    01:15 Did did someone give you the wrong powerpoint slide, or they just forgot to update it to no longer "confidential"?

    • @TechTechPotato
      @TechTechPotato  หลายเดือนก่อน

      Eh, that's semi standard with most slide decks I get from most companies. It's when it's in big red letters you have to worry

  • @Blackvipe1
    @Blackvipe1 26 วันที่ผ่านมา

    have they thought about cutting the chips then stacking them, and put cooling plates between the stacks.

  • @DileepB
    @DileepB หลายเดือนก่อน

    Moore's Law is about transistor density in a monolithic piece of silicon. There are creative ways of driving performance despite the end of Moore's Law!

  • @ZoruaZorroark
    @ZoruaZorroark หลายเดือนก่อน

    someday, we could see all this, but roughly in the same size chip found in a typical home pc's cpu

  • @antonisautos8704
    @antonisautos8704 หลายเดือนก่อน

    Bet youd be able to buy something with similar computing capability that only uses 150 watts and is 1/25th the size in just 10 to 15 years. Maybe less. Itll be cool to see what comes the closer we get to 2030.

  • @mikewoodman2872
    @mikewoodman2872 หลายเดือนก่อน

    Man, I bet Notepad runs like a dream on that thing.

  • @nicolasdujarrier
    @nicolasdujarrier หลายเดือนก่อน

    Although Cerebras Wafer Chip in 5nm is a good step in the right direction, it is only an incremental step.
    I am on the firm belief that the disruptive step would be to integrate Non-Volatile-Memory (NVM) especially MRAM (ex: SOT-MRAM or VCMA-MRAM) on the Wafer (1 out of 2 800mm2 chip should be embedded NVM) as this would open tremendous new architecture opportunities.
    You could even envision Wafer-on-Wafer stacking of 2 wafers in a way that each logic core is surrounded in 3D with MRAM Non-Volatile-Memory.
    Furthermore different kind of AI cores on the same Wafer could be envision as a better fit to multi-modal AI model.
    It is still early days, but clearly it is the kind of technology Apple should be investing in…

  • @danielgrayling5032
    @danielgrayling5032 หลายเดือนก่อน

    No more room at the bottom? No problem, plenty of room at the top.
    It's a big universe.