How long can Nvidia stay monolithic?

แชร์
ฝัง

ความคิดเห็น • 209

  • @MickenCZProfi
    @MickenCZProfi ปีที่แล้ว +72

    Thank you for this video, always very informative, I had no idea that the EUV-next lithiography shrinks the reticle limit by a factor of 2, that changes everything.

    • @HighYield
      @HighYield  ปีที่แล้ว +18

      I knew it was getting smaller, but I was again reminded about the huge decrease by a random comment on twitter. I think many ppl underestimate the impact high-NA EUV will have.

    • @MickenCZProfi
      @MickenCZProfi ปีที่แล้ว +5

      @@HighYield Yeah for sure, actually I heard about it today on MLID's leak video and it brought up a good point, this might actually make nvidia cancel the 90 class of gpus for a few years, because it will have to be smaller and won't be able to compete with the previous generations. Of course I do expect them to use chiplets on hyperscaler + AI stuff as you said, but for consumer gpus, it might be harder to justify a new design.

    • @maynardburger
      @maynardburger ปีที่แล้ว

      @@MickenCZProfi Intel is expected to have first High NA machines active in manufacturing sometime in 2025(assuming no delays). TSMC will likely not have the same capabilities til 2026. And even then we know that gaming GPU's are usually at least a year behind on leading edge nodes, so Nvidia's Blackwell GeForce parts will likely be unaffected by any reticle limit issues. And beyond that, it's more than likely gonna be another two years for the next generation, at which point they'll have had time to get on top of things. Shouldn't be an issue and they will continue to have high end consumer GPU's every generation. Also, dont get caught up with naming. 90 series used to be what was 80 series. It's not actually a new class of part.

    • @JavoCover
      @JavoCover ปีที่แล้ว

      ​​@@maynardburgerIs that why Intel kept with 14+++++++++ node for so long? Like waiting for the big change.

    • @Lemard77
      @Lemard77 2 หลายเดือนก่อน

      @@MickenCZProfi what mlid video was this?

  • @RobBCactive
    @RobBCactive ปีที่แล้ว +35

    Rather than just calling the game gfx latency dependent it's better to realise that the frames are tightly coupled in a way that HPC calculations aren't.
    An algorithm bouncing rays off a surface need the texture & colour to be known for example, if these rays are scattered and reflected then you need all of that early pass data to be available.
    I've seen explanations that games effectively have a global area, splitting it across dies is believed to cause problems.
    The difference is that if you could pipeline frames without synchronous requirements then each could take longer than the frame time, so long as they can start early on a wide enough GPU that can process several in flight frames.
    So long as the output frames respond to user input quickly latency would still appear low.

    • @hammerheadcorvette4
      @hammerheadcorvette4 ปีที่แล้ว +3

      VERY solid points. Some could be solved in software with a form of checkerboarding as you process.

    • @GeekProdigyGuy
      @GeekProdigyGuy ปีที่แล้ว +4

      1. Before realtime ray tracing was available, there were very few "global" calculations. However, two separate GPUs (Xfire/SLI/dual-chip designs) would have to be synchronized on processing of each frame to avoid tearing. To my knowledge the latency contributes significantly to making this synchronization difficult, even with relatively low inter-chip bandwidth usage.
      2. Of course as modern games increasingly implement and rely on RT, what you said about global information may become more applicable.
      3. There is no way to pipeline frames which can reduce the fundamental input lag; if it takes 10ms to render a frame, the dependency on user input means the input lag can never drop below 10ms. While you can increase the framerate with such pipelining, and possibly as a result smooth out the input lag, the total frame render time will be observable as input lag by the end user.

    • @RobBCactive
      @RobBCactive ปีที่แล้ว +2

      @@GeekProdigyGuy originally dual GPU duplicated VRAM data and each handled alternate frames. Tearing is caused by changing the display in the midst of its refresh hence setting v-sync or free-sync with the monitor avoids it.
      The point about pipelining was to show the limit of asynchronous operation in some super wide GPU, we know algorithms now use movement vectors and differences between frames, but you'd need to sample user input late enough to meet latency requirements.
      But seriously those differ between games, not every game is a twitch shooter.

    • @shanent5793
      @shanent5793 ปีที่แล้ว +1

      Pixels are mostly computed independently, even when ray-tracing. GPUs were invented for rasterization workloads where the same sequence of instructions are executed with data individual to each pixel and the CPU has already decided which triangles to draw. It's different for ray-tracing, each pixel requires multiple rays and the rays can scatter randomly. Rays are grouped and assigned to a compute unit or GPU core, and some rays will immediately hit a light source and terminate while others will reflect and scatter until the iteration limit is reached. When a ray terminates early the CU resources for that ray sit idle waiting for the others to finish, unlike a serial CPU which could immediately start processing the next ray. The trick is to find a way to maximize utilization by grouping rays that follow a similar path onto the same CU.
      Bounding Volume Hierarchy (BVH) is one such optimization, but it creates a dependency that has to be completed before the rays can be assigned resources, though the BVH is usually small enough to fit inside cache, so duplication across GPU chiplets isn't a great waste. BVH could even have its own specialized accelerators, like an array of simple CPU cores that execute the same cached program and can be reassigned to a new task while others iterate.
      The frames should not be pipelined in a latency sensitive game, ideally a frame is displayed, then inputs gathered, then geometry calculated and submitted to the GPU, then the frame is drawn and displayed, with no overlap between the stages. This gives the lowest possible latency. If your pipeline is five frames deep taking 50 ms to draw a frame, 50 ms is the minimum latency even if a new frame is displayed every 10ms (100 fps). Widening the processor so it draws the frame in 15ms with no pipeline means 15ms minimum latency despite the frame rate dropping to 67 FPS.
      Milliseconds is plenty of time to exploit pipelining and streaming across the individual pixels and effectively hide nanosecond VRAM and inter GPU module communication latency. Chiplets may be slower in some areas but since the majority of the work is still parallel there is plenty of performance to gain with a chiplet design.

    • @RobBCactive
      @RobBCactive ปีที่แล้ว

      @@shanent5793 your explanation suggests a cause of interdependence, utilisation is depending on correct grouping. Rendering pixels independently of each other doesn't mean they have no dependency on the same data. We know for lighting and other algorithms they depend on each other, with a frame constructed in passes.
      The question is how you break up that work across GCDs and how the large volume of data moves efficiently between them while meeting cost targets.
      That's different from long runs of predictable calculations on large vectors.
      We do know that multi-GPU has required v. high bandwidth connections between the parts, which are expensive. HPC & render farms without a real-time constraint can break up tasks over many processors.
      Now the best information known suggests the plans for multi-GPU RDNA4 has been shelved. Also RDNA3 split memory control & cache away from the GCD, but chose a single GCD in the first iteration.
      Right now we know RDNA3 missed its expected launch performance, and it doesn't appear to have a simple fix, with no word of a new stepping and refresh leaking.

  • @pwmaudio
    @pwmaudio ปีที่แล้ว +81

    Overall good analysis on this video but you forget the most important (and in fact the only) reason why NVDA didn't move to chiplets yet is the limited packaging/interposer capacity (CoWos in case of H100) and HBM TSV production machines. Currently, NV can get any quantity of dies from TSMC N4 but can't get enough HBM and package them fast enough to meet the market demand... to the point that a third packaging factory is opening in Taiwan and NV already booked the production for the next year. Samsung is also opening a new packaging fab in Korea (for HBM CoWos) to sustain NV business.
    Otherwise, keep the good work. Nice channel 👍

    • @SirMo
      @SirMo ปีที่แล้ว +8

      CoWos capacity is not difficult to scale. The packaging machines are nowhere near the complexity of the lithography equipment. TSMC is expanding this capacity rapidly, and I don't see it being a bottleneck long term.

    • @HighYield
      @HighYield  ปีที่แล้ว +19

      I agree with you in the sense that packaging (and HBM) currently is the limiting factor for manufacturing high-end GPUs, but H100 already uses CoWoS, only just to attach the die and the HBM chips onto the interposer, and not to connect multiple chiplets. So advanced packaging is already used for current gen Nvidia HPC/AI GPUs.
      For gaming the argument stands.

    • @pwmaudio
      @pwmaudio ปีที่แล้ว +1

      Totally agree but what I said is still true. Capacity is being build to meet the future demand but was not enough for current client GPUs that sales in much much higher numbers than DC H100. And I don't even talk about cost... @@SirMo

    • @pwmaudio
      @pwmaudio ปีที่แล้ว +3

      Client GPU quantity is a total different scale than DC A/H100. And it's even much easier for AMD when you have less than 10% market share and only one SKU is chiplet...@@HighYield

  • @HighYield
    @HighYield  ปีที่แล้ว +36

    This is the same video I shared on Patreon almost two weeks ago, so if you watched this, you have already seen it. Next video will come sooner, pinky promise!

    • @zesanurrahman6778
      @zesanurrahman6778 ปีที่แล้ว

      It can't cause pc master race creating a gpu that is faster than nvidia and cheeper

  • @samghost13
    @samghost13 ปีที่แล้ว +8

    Thank you very much! I'am always looking forward for new videos from your Channel

  • @falsevacuum1988
    @falsevacuum1988 7 หลายเดือนก่อน +3

    And you were right, Nvidia made Blackwell from 2 chiplets.

  • @mikelay5360
    @mikelay5360 ปีที่แล้ว +38

    They will stick with monolithic for as long as they need to, in gaming at least.
    Remember NVIDIA is not one to lay on their laurels , I am 100% sure they have chiplet based chips in their RnD labs just waiting for the right time to pull the trigger.

    • @user-lp5wb2rb3v
      @user-lp5wb2rb3v ปีที่แล้ว +11

      exactly they will keep milking the market, and if they cant they will market their way is better.
      For example nvidia could have released the 780ti in 2012, the 980ti in 2014 and the 1080ti (which is cut down with 11gb not 12) in 2016, but they milked instead.
      Notice how people cried about the r9 290x consuming too much power/ loud noise, yet look at how silly the 4090 is in comparison. And somehow people would rather buy the 4090 than a car lol

    • @mikelay5360
      @mikelay5360 ปีที่แล้ว +3

      @@N_N23296 intel's fall was 10 years in the making. When NVIDIA starts to fall, we will definitely know from experience.. actually rumours suggest that AMD is actually the one giving up 😂 but let's see

    • @mikelay5360
      @mikelay5360 ปีที่แล้ว +1

      @@N_N23296 you go where the money is. Even AMD and Intel tend to focus more on the server side because 'money'! Gaming is a niche in these times !

    • @mikelay5360
      @mikelay5360 ปีที่แล้ว +2

      @@N_N23296 ohh I see now 🤣 AMD this! AMD that !

    • @26Guenter
      @26Guenter ปีที่แล้ว

      If Nvidia had a chiplet architecture they would release it.

  • @Innosos
    @Innosos ปีที่แล้ว +19

    If I had to make a guess, the next gen will just be a small refinement of Lovelace with larger dies (a.k.a. a 50, 60, 70 and 80 class GPUs with a typical 50, 60, 70 and 80 class die sizes) since there's so much space left this generation.

    • @charleshorseman55
      @charleshorseman55 10 หลายเดือนก่อน

      Try smaller dies, larger transistor, run higher frequency. Oh wait that's what usually happens.

  • @SirMo
    @SirMo ปีที่แล้ว +8

    Nvidia's entire origin story has always been about building the biggest chip possible. The reason they haven't went to chiplets is related to this paradigm of always having the largest chip. As you said, their margins and scale allowed for this and no one else could follow since they simply didn't have volumes which could justify the cost. But this advantage is going away. And I think companies like AMD have far more experience with chiplets.

    • @maynardburger
      @maynardburger ปีที่แล้ว +5

      I think underestimating Nvidia on the technology front is a very big mistake. There are only a tiny handful of processor companies in the world with comparable resources, and Nvidia has a pretty strong track record of execution. I expect when they do make a move to MCM/stacking, they're gonna do very well with it. We should also not forget that AMD is piggybacking heavily on TSMC's technologies, which Nvidia will also have access to when it comes time.

    • @SirMo
      @SirMo ปีที่แล้ว +7

      @@maynardburger People underestimate AMD's technology. It is Nvidia who's piggy backing on AMD's technology for example. AMD invented HBM which Nvidia uses heavily in datacenter. AMD also has the strongest CPU and FPGA development cadre as well.

  • @Alex-ii5pm
    @Alex-ii5pm ปีที่แล้ว +8

    Chiplets are used for cost saving, they get better yields from smaller silicon and less wastage, monolithic will always be superior for gaming gpu's.

    • @HighYield
      @HighYield  ปีที่แล้ว +8

      Chiplets are not always used for cost saving, even tho the most famous chiplet design (AMDs Zen 2) used it for that.
      For example, Meteor Lake is most likely more expensive to produce than its monolithic predecessors and chiplets can also be used to achieve much higher performance, because a monolithic chip has a hard die-size and thus transistor count limit. MI300 for example is faster than any possible monolithic chip AMD could design. I even quote a Nvidia research paper in the video which states, that a proposed chiplet architecture can be 45% faster than the largest monolithic chip.

    • @Alex-ii5pm
      @Alex-ii5pm ปีที่แล้ว +1

      @@HighYield in production based tasks it will be better however with the increase in latency due to the nature of chiplets will suck for gaming, I can see why Nvidia still uses monolithic designs at least for their gaming products. Compare the first Zen CPU to the monolithic Intel CPUs in production workloads they where awesome, in gaming they were horrible in gaming due to the high latency of the chiplet design. I can see the new chiplet/tile based Intel CPUs having the same issue in gaming tasks, we will either see no performance change or a regression. Only time will tell.

    • @Fractal_32
      @Fractal_32 ปีที่แล้ว

      @@Alex-ii5pm well current implementation of chiplets are not the best for gaming latency wise they may be in the future since it’s a new technology that hasn’t been fully adapted to gaming applications.
      Maybe chiplets will be even better in the future since a given chipset could be focused on a fixed function/operation instead of more general use cases.

    • @soraaoixxthebluesky
      @soraaoixxthebluesky ปีที่แล้ว

      @@N_N23296If you look at Ryzen, on Zen+ they’re using 4+4 config for 2700 & 2700X but then switch to a single monolithic design (maintaining separate I/O die) for Zen 2 on 3700X $ 3800X you can clearly see a huge performance increase (also part of it is them switching to TSMC)
      3100 vs 3300X also a real world testimony to that.
      The only reason why you see a performance gain (as stated in Nvidia research paper) is due to massive transistors count differences between the monolithic design and the chiplets as you can easily scale the transistors up with chiplets on the same process node technology.
      On latency sensitive application like gaming where saturating the compute unit becomes a huge challenge, chiplets design with similar transistor counts will always fall behind.

    • @lefthornet
      @lefthornet ปีที่แล้ว

      ​@@Alex-ii5pmAs far as I know the main issue with RDNA 3 chiplets was the render issue at high clocks, that's why they missed the performance target, the Chiplets didn't affect the gaming performance, because all the Computer units were together, so probably in the short and medium term that will be the future of gaming GPUs until the latency issue gets solved, probably if some console use a Chiplet design the engines will improve the optimisation for distribution of workflow.
      In the other hand, Ryzen with 3D cache is the best for gaming right now and don't have any latency issues, a monolithic Ryzen has 40 - 50 ns between CCDs (how we get that data, the APUs that are monolithic), while the Ryzen Chiplets get ,50 - 70 ns (the variation depends on the frequency of infinity fabric and because of that in the Ram frequency), at that scale no human can see that difference is literally scales of magnitude below our senses. Chiplets and other manufacture innovations are necessary, because the chip manufacturing is too close to the limit of physics and there is no viable replacement for silicon, yes graphene is a candidate, but there is not a huge improvement in scale manufacturing of it, until that, is only silicon and it has a limit, physics, that is a hard limit and we are really close to that.

  • @DJaquithFL
    @DJaquithFL ปีที่แล้ว +2

    **Chiplet is synonymous with Cheap.** There's no other upside. A Monolithic CPU or GPU doesn't have their chiplets separated by millions of nanometers of added unnecessary latency. Even in Intel's Tile, each tile is specialized. The GPU, SoC, and CPU are all on their own tiles to avoid the latency cost from downgrading from a monolithic design.

  • @theevilmuppet
    @theevilmuppet ปีที่แล้ว +10

    Wow - your work has always been amazing but you're continuing to improve your presentation and focus on the critical details.
    Please, keep going!

    • @HighYield
      @HighYield  ปีที่แล้ว +2

      Thank you so much! I will keep making videos as long as I have fun doing so :)

    • @theevilmuppet
      @theevilmuppet ปีที่แล้ว

      @@HighYieldand I'll keep watching them as long as you're making them!

  • @ramr7051
    @ramr7051 ปีที่แล้ว

    good to see you back :) hope everything is going well for you

    • @HighYield
      @HighYield  ปีที่แล้ว +1

      It's actually going very well, both in my job and personally. Let's see if I can get back to at least bi-weekly videos. I have been slacking a bit... ;)

  • @Akveet
    @Akveet ปีที่แล้ว +13

    Nvidia historically waits for a generation to implement some new technology compared to the competitors. Because given their lead they can outperform the opposition on older tech saving money in the process. As soon as the savings from the new tech become measurable, Nvidia switches to the new tech.

    • @Wobbothe3rd
      @Wobbothe3rd ปีที่แล้ว +2

      Lol, "some new technology" CHIPLETS ARE BAD

    • @thomasfischer9259
      @thomasfischer9259 ปีที่แล้ว +2

      Major green cope

    • @Akveet
      @Akveet ปีที่แล้ว +6

      @@thomasfischer9259 I don't even have and Nvidia gpu, I'm rocking a 5700XT. I'm just stating the facts. Nvidia is technologically ahead, so they juice every last cent out of the cheaper technologies before switching to the never ones.

    • @How23497
      @How23497 ปีที่แล้ว

      @@Wobbothe3rdyou literally watched a 14minute video explaining how Chiplets are the only way forward to continue increasing computational performance, and you make this dumbass comment? Why 😂

    • @baoquoc3710
      @baoquoc3710 ปีที่แล้ว +1

      @@thomasfischer9259 well if he coped, the 7900 XTX will be way better than the RTX 4070TI without any problems of gargantuan amount of power consumption

  • @RealLifeTech187
    @RealLifeTech187 ปีที่แล้ว +4

    I would say Hopper Next is monolithic as Nvidia tries to capitalize on the AI boom with an early release and before the competition can launch something more interesting. Big corporations aren't that willing to take risk as they have their leadership role to loose while the under dog(s) can as they don't have a brand to lose if it doesn't work. Hopper Next Next will for sure be MCM because of the reticle limit. Maybe Hopper next is an intermediary generation and we see both a monolithic chip launching first to ensure leadership which is followed up by a risky MCM on the same architecture which takes longer to develop and has the potential to beat it

  • @2dozen22s
    @2dozen22s ปีที่แล้ว

    There is a lot of upcoming tech that will primarily only push logic density forward.
    With high-NA halving the recital limit, and GAA + Backside power delivery increasing complexity, it might be unwise, or even uneconomical to put L3 or L2 on the die at all. Necessitating die stacking to maintain the necessary bandwidth/latency.
    Hopefully the thermal reductions gained from GAA and backside power will be enough to just stack cache directly onto the logic without issues.

    • @maynardburger
      @maynardburger ปีที่แล้ว

      Yea, large cache chips that can be stacked underneath the compute die are the future. Lets you have a lot more cache, while also freeing up room for more compute(or just going with a smaller die with the same amount of compute).

  • @darrell857
    @darrell857 10 หลายเดือนก่อน +1

    nvidia will continue to produce giant chips, since they have perfected how to do it and the margins support it. To stretch that as far as it can go, they will make chips more and more specialized for particular models or customers.

  • @WSS_the_OG
    @WSS_the_OG ปีที่แล้ว +7

    In my view, Nvidia can stay monolithic for as long as it likes due to the high margins on its products. The main advantage to moving to chiplet or tile-based designs is lowered silicon cost. So while it might mean more money in Nvidia's pocket, it's not like they're hurting for money at the moment; they're swimming in profits, with AI only providing a new golden era of profit potential for them.
    There's nothing wrong with monolithic chips inherently, except for the large write-off a chip defect might incur. If you're making as much money as Nvidia, you can afford that loss.
    Also, if we look at AMD, it's not like they're passing the savings of their chiplet designs down to consumers anyway; they're just pocketing the money they're saving.

    • @MacA60230
      @MacA60230 ปีที่แล้ว +5

      You didn't watch the video did you

  • @81Treez
    @81Treez ปีที่แล้ว

    You deserve more subs. Great content.

  • @BecomeMonke
    @BecomeMonke ปีที่แล้ว

    Wow you made some real dry topic real interesting to listen to, thanks for the video

  • @manueladolfoholzmannillane3050
    @manueladolfoholzmannillane3050 4 หลายเดือนก่อน

    I have a question. The chiplet philosophy came with HBM. Why they dont use a HBM solution or HBM spirit solution for latency issue?

  • @B1-K4R
    @B1-K4R ปีที่แล้ว +6

    Does it matter if they deliver industry leading the performance, effeciency and profit?
    Nvidia wont rush into things just for the sake of doing it.

    • @GeekProdigyGuy
      @GeekProdigyGuy ปีที่แล้ว +2

      Did you watch the video? The whole point is he thinks they CAN'T keep leading performance and efficiency forever without switching to chiplets...

  • @niyazzmoithu20
    @niyazzmoithu20 ปีที่แล้ว +1

    Isnt the monolithic more efficient?

  • @JoeLion55
    @JoeLion55 ปีที่แล้ว +1

    Why has SRAM size stopped scaling?

    • @HighYield
      @HighYield  ปีที่แล้ว +1

      Check out this video: th-cam.com/video/vQ5JPqeFitM/w-d-xo.html

    • @JoeLion55
      @JoeLion55 ปีที่แล้ว +1

      @@HighYield thanks, that’s great info. Do you have any explainers as to what the physical limitation for SRAM scaling is? As a DRAM engineer I am well aware of the manufacturing problems that DRAM has as we try to scale the Wordlines, bitlines, and capacitors. However, I had always assumed that SRAM would continue to scale directly with the logic transistor sizes.
      And now that I write that, it occurred to me that the latest logic process nodes are less focused on transistor scaling, and more on block layout, optimizing power delivery, minimizing black silicon, etc. And I assume all of that optimization has already occurred in the SRAM arrays, so as long as the transistors are staying the same size there’s nothing else to do in the SRAM array. Is that the right track?

  • @jabcreations
    @jabcreations ปีที่แล้ว +3

    Nvidia's engineers are damn capable, the problem is they work for Nvidia.

  • @MrArunvappukuttan
    @MrArunvappukuttan ปีที่แล้ว

    Very good analysis.. One generic drawback of chiplets is the higher power , area and latency that Die2Die PHYs and controllers contribute. But none of this would matter if the max reticle size is reduced to half!

  • @D.u.d.e.r
    @D.u.d.e.r ปีที่แล้ว +2

    I completely agree with u and your predictions. Chiplets r without a doubt future of the chip designs and Nvidia will have to jump on this wagon sooner than latee especially with the enterprice chips.

  • @simplemechanics246
    @simplemechanics246 ปีที่แล้ว

    Chiplets makes possible to make custom made final assembly. Add more compute units, mix different clock speed cores, increase or decrease L3, graphics etc. Consumers can pay load of money to get very-very unique assembled units. All that requires to make the future systems ready for easy modifications. Yes, every unit need very special motherboard firmware update but that is no big deal to add for custom made assembly bill. Everything base anyway on customer selected chipset, not rocket science add the custom made software code. I am 100% sure the could sell that way crazy things, ever several thousands of euros costing consumer units.

  • @OfficialSteftom
    @OfficialSteftom ปีที่แล้ว +1

    I know RDNA 3's chiplet structure seems like a dud as of right now but I believe AMD made the right choice to try it out early on so they can work out the kinks as soon as possible before Nvidia takes over the consumer market with chiplets. Nvidia, with their insane war chest for R&D, might just knock it out the park from the get-go.

  • @VideogamesAsArt
    @VideogamesAsArt ปีที่แล้ว

    Always enjoy watching and hearing your opinions. You do very good analysis, keep up the good work!

  • @lahma69
    @lahma69 ปีที่แล้ว

    First time viewer of your channel here and I really enjoyed hearing your opinion on this topic which I've been thinking a lot about lately. I look forward to exploring your past and future content!

    • @HighYield
      @HighYield  ปีที่แล้ว

      I hope my other content doesnt disappoint ;)

  • @andycarr3711
    @andycarr3711 ปีที่แล้ว

    You were excellent on Broken Silicon. Like, subscribe and best wishes.

    • @HighYield
      @HighYield  ปีที่แล้ว

      Thank you. New video should be coming up soonishTM

  • @rookiebird9382
    @rookiebird9382 ปีที่แล้ว

    High NA EUV was said to be available in 2023. Now they say it will be available in 2025.

  • @ipurelike
    @ipurelike ปีที่แล้ว

    make sense, thanks for being super informative!

  • @hartyleif
    @hartyleif ปีที่แล้ว

    why are there no triangle microchips? all of them are squares

    • @charleshorseman55
      @charleshorseman55 10 หลายเดือนก่อน

      Or how about amorphous? Infinite divisions of pi!

  • @timparker9174
    @timparker9174 8 หลายเดือนก่อน

    Do a deep dive into Nvidia's next chip! You explain these complicated processes very well. Although, with hindsight Nvidia made another monolithic chip. Love to hear your take on it. Thanks

  • @i_scopes_i3914
    @i_scopes_i3914 ปีที่แล้ว

    Hey max, what do you think of the gen-z interconnect possibilities and if it will be utilized and when?

  • @yoppindia
    @yoppindia ปีที่แล้ว

    Only couple of years ago NVIDIA used to promote SLI based GPU's, How can you say GPU does not scale with multiple chiplets, latency in SLI based configuration will be more than they would be in chiplets. it is the question of will, not the way.

  • @dr.python
    @dr.python ปีที่แล้ว +1

    I just hope either intel, amd or nvidia be the first to move away from x86 towards ARM architecture and the next generation consoles solely based on ARM architecture. It is clear it will eventually happen, but the question is when.

    • @maynardburger
      @maynardburger ปีที่แล้ว

      I dont know why we'd hope for that, personally. ARM isn't really inherently better as a whole and its efficiency advantages and whatnot that people tout now will get reduced as it is further developed and complicated, and I really dont look forward to the software issues that ARM PC's will face for quite a number of years as compatibility problems and translation software and whatnot need to be ironed out. Consoles especially might require losing all backwards compatibility, which will be a heavy blow for both gamers and the industry in general.

    • @dr.python
      @dr.python ปีที่แล้ว +1

      @@maynardburger Its not a question of whether it is better or not, but the future we're headed towards and how we get there. If there is only one manufacturer (Apple) who uses ARM in a world where most devs have optimised for ARM then it'll be a monopoly and won't be good, since transition to ARM is inevitable. If you can argue that transition to ARM is not inevitable then you might have a case.

  • @andikunar7183
    @andikunar7183 ปีที่แล้ว

    Great video, thanks a lot!

  • @morgan3392
    @morgan3392 ปีที่แล้ว

    Thoroughly enjoyed this video. Understood nothing, but appreciate it all the same!

  • @ahmedp8009
    @ahmedp8009 ปีที่แล้ว

    Can you make a video explaining why CPUs are limited to 2-threads per core?
    Why don't we have, lets say a 4-core CPU/12-threads (3-threads per core), etc?

    • @HighYield
      @HighYield  ปีที่แล้ว +1

      This is simply due to the fact that Intel and AMD only implement SMT2 (which means a single core can run two threads). IBM for example has CPUs that offer SMT4 and even SMT8. The more SMT threads you use, the lower the over all scaling, but it's possible to run more than 2-threads per core if you design it that way.

    • @ahmedp8009
      @ahmedp8009 ปีที่แล้ว

      @@HighYield I see, thanks!

    • @pyromen321
      @pyromen321 ปีที่แล้ว

      @@HighYield​also worth adding, under certain workloads SMT literally does not improve performance so it doesn’t make sense to add more threads. It’s really only good for tasks that have frequent waits on high latency things (or programs that haven’t been optimized at all).
      When you have two or more threads running optimized code competing for ports and execution units, each thread will be capable of filling more than half of the ports and execution units. Typical CPUs now evaluate well over 200 instructions at a time and find a way to reorder them to run as many instructions in parallel as possible (search reorder buffer for more info).
      From what I’ve seen, as branch prediction and reorder buffers have improved, practical benefits from SMT have plummeted.
      You could theoretically design a program that would run just as fast on a single core with SMT compared to two cores without SMT, but it would either be incredibly naive or incredibly tricky.
      A naive solution I just thought of would be one thread doing an integer cumulative sum and another thread doing a floating point cumulative sum. In this case, the reorder buffer wouldn’t be much help to either thread, and neither thread would slam the other’s arithmetic ports (depending on the architecture, that is).

  • @sailorbob74133
    @sailorbob74133 ปีที่แล้ว +1

    Nvidia and Jensen are super smart, but also a bit arrogant. I could see them waiting until the High-NA generation to deploy chiplets...

  • @thevillain8151
    @thevillain8151 ปีที่แล้ว

    So why not 3D monolithic chips over chiplets? Wont that be way better since you wont need different things to connect the chiplets to communicate together?

    • @maynardburger
      @maynardburger ปีที่แล้ว

      At some point, perhaps yea. But stacking compute layers on top of each other has huge heat problems that need to be solved first. That may take a while for any kind of high performance applications.

  • @josephm3615
    @josephm3615 ปีที่แล้ว

    Great video.

  • @labloke5020
    @labloke5020 ปีที่แล้ว

    How about Gaudi?

  • @henrycook859
    @henrycook859 ปีที่แล้ว

    I think Google's TPU's will be on track to be competitive with Nvidia and AMD for AI training, not consumer gpu though

  • @Savitarax
    @Savitarax ปีที่แล้ว +2

    I feel so confidently that nvidia is going to make the 5090 a MCM design because of just how massive the 4090 is and how much TSMC is struggling to make smaller and smaller chips

    • @maynardburger
      @maynardburger ปีที่แล้ว +2

      The 4090 isn't especially massive. It's smaller than the 3090/AD102 was. Quite a bit smaller than the 2080Ti/TU102 was. And heck, the 4090 is actually more cut down than the 3090 was, even with the slightly smaller die. 4090 is more like what the 3080Ti was.

    • @kaystephan2610
      @kaystephan2610 ปีที่แล้ว +2

      4090 isn't particularly massive.
      3090Ti was 628mm²
      2080Ti was 754mm²
      Only 1080Ti was significantly smaller for reasons mentioned in the video
      980Ti was 601mm²
      780Ti was 561mm²
      So the 4090 isn't especially large. 600+mm² surely is very big for consumer cards, but it's a regular thing in the enthusiast space.

    • @mrrolandlawrence
      @mrrolandlawrence ปีที่แล้ว

      TSMC are not struggling. They are at the cutting edge & creating new technology as we speak. Creating new technology is hard. Always has been.

  • @Kaptime
    @Kaptime ปีที่แล้ว +1

    The economics of a chiplet based design speaks for itself, it's the clear choice going forward.

  • @fatplanediaries
    @fatplanediaries ปีที่แล้ว

    Your videos are chips and cheese in video form. I hope you grow big!

    • @HighYield
      @HighYield  ปีที่แล้ว

      Thank you for the compliment, but chips and cheese goes much more in-depth than I ever could. These guys are on another level!

  • @leorickpccenter
    @leorickpccenter ปีที่แล้ว +1

    Nvidia knows the problems with the chiplet approach on graphics. They have looked at it and deemed not ready or problematic. But at some point, they will have to and this will be a problem. As by that time, Intel and AMD may have solved these issues by then.

  • @ChinchillaBONK
    @ChinchillaBONK ปีที่แล้ว +2

    thanks for addressing this issue. i was wondering why the stock market is pushing Nvidia's stock price so high knowing we are beginning to reach the silicon physical limits of monolithic designs.
    chiplet design seems to be the immediate future of traditional silicon chips for at least next 10-15 years before other computing tech like photonic neural network chips or quantum chips , start to take over.

    • @LeonardTavast
      @LeonardTavast ปีที่แล้ว +1

      Quantum computing is only faster than traditional computing for a limited set of workloads and requires cooling the chips down almost to 0K. It will probably never become mainstream.

  • @tiagomnm
    @tiagomnm ปีที่แล้ว

    NVIDIA announced it will supply Mediatek with GPU chiplets to use in automotive chips.
    GPUs but not exactly consumer ones.

  • @mattmexor2882
    @mattmexor2882 ปีที่แล้ว

    From what I remember, that hypothetical MCM research chip from Nvidia was faster because it used more die area. Monolithic is always better for performance and energy efficiency, as least for the scale of what fits on one monolithic die. Since Nvidia GPUs are scaled up much larger than what can fit on a single interposer - for the vast majority of their revenue they lash 8 reticle-limit GPUs together with NVLink to make a single node and then lash many nodes together with NVLink and/or Infiniband to make pods - any advantage chiplets give for larger-sized packages mostly gets washed out during that further scaling.
    I believe Nvidia would like to skip the excessive use of modules as much as they can and instead rely on their serdes expertise and in-package optical I/O. Of course they likely will eventually need to use tiles to some extent, and in-package optical I/O itself will rely on chiplets, but I think they would like to limit tile/chiplet use to where it is most economically advantageous and tackle scaling and bandwidth issues with optics rather than with advanced packaging.

  • @jjdizz1l
    @jjdizz1l ปีที่แล้ว

    Interesting take. I would have to agree that standing still is not the best course of action.

  • @ZackSNetwork
    @ZackSNetwork ปีที่แล้ว +1

    I don’t see Nvidia going Multi Chip until RTX 60 series exclusive to the RTX 6090 in 2027. Multi chip should then be seen on the 90, and 80 class GPU’s in the 70 series in 2029. Unlike AMD, Nvidia will only do Multi chip when they need to.

  • @bigcazza5260
    @bigcazza5260 ปีที่แล้ว +1

    stuck lol nvidia has the best mcm and is just waiting to need it

  • @Lu5ck
    @Lu5ck ปีที่แล้ว

    Chiplet on gaming GPU is just too difficult unless there is a breakthrough on how to send huge amount of data. AMD will have advantage in chiplet design as AMD does both general purpose CPU and GPU so they got more ways to gain more knowledge and experiment.

    • @DetectiveAMPM
      @DetectiveAMPM ปีที่แล้ว

      Just to difficult until PS 6 or PS 7 using chiplet based from AMD

  • @Anonymous______________
    @Anonymous______________ ปีที่แล้ว

    Ummm ignoring latency for the sake of throughput/bandwidth will inevitably come back to screw you. This is especially true for wiring and connections at the nm scale.

  • @Timberjac
    @Timberjac 11 หลายเดือนก่อน

    Since Nvidia is testing manufacturing processes at Intel's Angstrong, I don't think they'll have much trouble adapting.

  • @przemekbundy
    @przemekbundy 7 หลายเดือนก่อน

    I always wonder how "they" do it all. My point is that they won't get it all wrong. that they won't get lost in these millions of transistors. not to mention every reconstruction of every structure. the way I look at it. it's like looking at a sky full of stars.

  • @MacA60230
    @MacA60230 ปีที่แล้ว +3

    Yeah Nvidia is moving to chiplets sooner rather than later. I also think they'll do so in an impressive way, out of the trio of AMD intel and Nvidia they're the absolute best when it comes to executing. It's one of the reasons Nvidia is so dominant, they just don't mess up.
    As such I don't expect some timid first try for Hopper Next, but a full fledged cutting edge chiplet design.

  • @shanent5793
    @shanent5793 ปีที่แล้ว

    Rendering computer graphics for interactive computer games is the easiest thing to adapt to multithreading, ie. an embarrassingly parallel workload. Gamers are sensitive to latency measured in milliseconds, while nanoseconds can bottleneck an HPC or ML job. Graphics calculations are mostly independent and processed as streams, completely hiding any latency. All that matters for interactive graphics is that all the pixels get drawn in time, and there are very few dependencies that don't fit in cache. So I think you have the latency sensitivities of games vs. AI/HPC completely backwards.
    AMD GPUs currently only use memory-cache chiplets because it's the first generation and the least risk with the highest reward potential, and not because of any limitations in scaling graphics applications to modular GPUs

  • @pandoorapirat8644
    @pandoorapirat8644 ปีที่แล้ว +1

    The blackwell will use chiplet design.

  • @venzoah
    @venzoah ปีที่แล้ว

    An even better question is, how long can Apple stay monolithic? M1 and M2 are huge.

  • @chriskaradimos9394
    @chriskaradimos9394 ปีที่แล้ว

    great video

  • @stellabckw2033
    @stellabckw2033 ปีที่แล้ว +3

    why call a *new* technology "ponte vecchio" if it means *old* bridge in italian? lol

    • @RobBCactive
      @RobBCactive ปีที่แล้ว

      Same reason they chose Crater & Cougar Lake as codenames ~snigger~

  • @grospoulpe951
    @grospoulpe951 ปีที่แล้ว +1

    AI chips? I guess they will go chiplet.
    GPU chips? Well, as the rumors saying that RDNA 4 will not have high end chip (aka: chiplet design like Navi 31 and Navi 32) (latency problems as you mentioned ?) focusing on Navi 43 et Navi 44 (probably monolithics); and Nvidia Ada Next will probably be monolithic, even on the high end GPU (xx102) using, I guess TSMC N3 or better and some architectural improvement.
    So, Maybe in 2026+ AMD will come back with RDNA 5 in the high end using chiplet (Navi 51?) to compete with NVidia on chiplet, too (using TSMC / Samsung 2N or so)
    2026= is still a long way to go...

    • @lunascomments3024
      @lunascomments3024 ปีที่แล้ว

      it's because the prices are not sustainable for AMD to produce high end products. going to newer nodes not only increases the price but also the design complexity.

    • @grospoulpe951
      @grospoulpe951 ปีที่แล้ว

      True. AMD has, at least, two choices : increase prices (as Nvidia did) or sell more units to compensate those arguments...@@lunascomments3024

    • @grospoulpe951
      @grospoulpe951 ปีที่แล้ว

      an, of course, (really) increase performance, especially in the "mid-range" GPU (aka Navi 42/52/...) (Navi 21/31/51... are high ends for me...)

  • @mannyc19
    @mannyc19 ปีที่แล้ว

    9 min 6 seconds, you are forgetting about '3DVcache.... nVIDIA can stack in 3d,same with reticle limits,stack upward. How long for Massive dies? honestly ? Several years to come. So said Jim Keller a few months ago when asked. He would know with his insider knowledge. I can think of at least two more who know for sure. Jenson Huang is #2,but there are more as well versed as Jim,etc

  • @hishnash
    @hishnash ปีที่แล้ว

    they might be but more like apples ultra chips with a massive die to die bandwidth bridge.

  • @oscarcharliezulu
    @oscarcharliezulu ปีที่แล้ว

    I’m sure when Nvidia brings out a chiplet or tile design it will bow us away.

  • @aacasd
    @aacasd ปีที่แล้ว

    Considering the GH200 specs NVDA still has an edge over AMD and INTC. But their software stack is more than decade ahead, so even if AMD wins on chiplet, they will not wider adoption due to poor software support. This gives NVDA enough time to spend on chiplet R&D and still stay ahead of AMD. INTC is much behind AMD so it's not fair to compare them.

  • @tringuyen7519
    @tringuyen7519 ปีที่แล้ว

    Nope, Blackwell will be monolithic on TSMC’s 3nm node. Blackwell will hit TSMC’s reticle limit on 3nm.

  • @bobbyboygaming2157
    @bobbyboygaming2157 10 หลายเดือนก่อน

    Isn't monolithic "Better" anyway? Chiplet seems like you just create more problems to solve. It is just a production cost thing more than anything else, however since all the costs get passed to the consumer, I guess you could say it is better for us that they all start using chiplets.

  • @danburke6568
    @danburke6568 ปีที่แล้ว

    Nvidia series 5000 is not chiplet design, AMD is having problems with rdna 4.
    No way and no point pushing themselves when they are the only ones with the crown.
    The problem maybe the 6000/7000 series, when AMD will be putting out some soild hardware.
    Will Nvidia have a intel moment and fail in development letting AMD run away from them. Maybe but Nvidia will have mind space like in intel did and will have many years to come out on top.

  • @mylittlepimo736
    @mylittlepimo736 ปีที่แล้ว

    Why do you think Apple hasn’t adopted a chiplet design?

  • @juancarlospizarromendez3954
    @juancarlospizarromendez3954 ปีที่แล้ว

    together chips for saving golden wires

  • @chrisgarner5765
    @chrisgarner5765 ปีที่แล้ว

    They already have a faster, more stable interconnect than AMD, so they can do what they want at anytime! Nvidia can connect GPUs together faster than AMD can connect chiplets so all of it is kind of mute!

  • @semape292
    @semape292 ปีที่แล้ว +1

    i think nvidia will use chiplets with rtx 6000.

  • @TheEclecticDyslexic
    @TheEclecticDyslexic ปีที่แล้ว

    They will put it off as long as humanly possible. Because they are comfortable where they are and would prefer to do nothing if they can.

  • @Ludak021
    @Ludak021 ปีที่แล้ว

    Who told you that nVidia is in the chiplet race?

  • @mikebruzzone9570
    @mikebruzzone9570 ปีที่แล้ว +1

    Nvidia owns TSMC 4 and will simply ride 4 nm depreciated cost curve down to introduce BW return to desktop design generation in mass market volumes from Ada mobile design generation produced at a higher cost : price but good for 50 M units of AMD and Intel mobile H attach during H mobile producers ramp plus some HPC cards at 4 m risk production also more costly like x3 TSMC 5 nm cost but Nvidia is making money with 4 nm now and into the future. Pursuant SIP slowly but surely. mb

  • @baumstamp5989
    @baumstamp5989 ปีที่แล้ว

    nvidia have put so much energy and effort into their gaming gpu market share that they truly have lost the compute/datacenter development out of their sights.

  • @Sheerwinter
    @Sheerwinter ปีที่แล้ว +1

    @_@ nvidia apu would be amazing like a 7600x and a 3060 in just a single cpu. With dlss 4

  • @tek_soup
    @tek_soup ปีที่แล้ว

    yeh i agree. we gamers are screwed. im pissed cause they did not put displayport 2.1 on the 4090, and so we will hopefully get a refresh of the 4090, but that going to cost$$$ because 5 series not till 2025. im shure they planned it this way, bastards.

  • @Raja995mh33
    @Raja995mh33 ปีที่แล้ว +1

    I mean Nvidia but also Apple don't use chiplets and so far they're doing great and beat the competition in many areas 😅

    • @skirata3144
      @skirata3144 ปีที่แล้ว +3

      Well technically Apple is using chiplets with their Mx-Max chips which just stitches together two of the lower class chips.

    • @aravindpallippara1577
      @aravindpallippara1577 ปีที่แล้ว

      ​@@skirata3144and sadly ultra (2 connected max chips) have a lower gaming performance than the max monolith variant
      It's amazing what rdna3 achieved as such, but I have faith AMD will figure it out going ahead.
      Nvidia was always at the forefront of technology I don't doubt they will also switch to multi chips but probably follow the intel/apple model of expensive interposers as opposed to AMD's interconnects

  • @lil----lil
    @lil----lil ปีที่แล้ว

    AMD *HAD* to try something different, it was do or die for them and it paid off Big Time.
    Intel was in "No Rush" to innovate and they paid a HUGE price for it. So much so that the company is in shaky ground now.
    And Nvidia? Nvidia lucked out. They saw what chiplet did for AMD that DECIMATED Intel's CPU performance. With a hyper aware engineer CEO,, they won't be making that mistake and you can count on it.

  • @roilevi2
    @roilevi2 9 หลายเดือนก่อน

    Blackwell is not monolithic ...

  • @AuroraLex
    @AuroraLex ปีที่แล้ว

    Nvidia could probably stay monolithic for another couple of generations if they wanted to.
    High NA is a resolution bump so it can probably rekindle SRAM scaling to some extent like EUV did, but with GDDR7 comming, the need for a large SRAM cache won't be as important anymore, and for larger than 400 mm2 dies, dual masks + stitching is an option if Nvidia is willing to pay the price.

  • @przemekbundy
    @przemekbundy 7 หลายเดือนก่อน

    I don't know if I'm backward. am I the only one who is backward? but no normal person can understand this. especially this technology. what are you talking about. After downloading, it can be assumed that this is understandable. but who really understands it. and knows how to use this technology...
    or this rat race. it is a race for the very principle of being the best. I guess it's all about money... I guess there are no higher goals... does anyone know where this is all going... someone started the machine. but it all has no end. it's all a rush. I wonder when it will stop....

  • @wakannnai1
    @wakannnai1 ปีที่แล้ว

    Not so important for Nvidia. When you're selling GPUs for $30-40k a pop, and you still can't meet demand, chiplets are not important. Furthermore, these clients and their workloads work just fine with NVLink and multiple GPUs. There's literally no incentive for Nvidia to go to chiplets because they're selling these dies at such a premium, it's not worth the cost to move to chiplet architecture.

  • @samlebon9884
    @samlebon9884 ปีที่แล้ว

    A question to all those who are praising Nividia;
    How far is AMD is ahead of Nvidia in chiplet tech and homogeneous computing?
    Here is a hint: when EL Capitan super comes online, you'll have your answer.

  • @profounddamas
    @profounddamas ปีที่แล้ว

    "How long can Nvidia stay monolithic?" As if you know...

  • @DDD-xx4mg
    @DDD-xx4mg ปีที่แล้ว

    Chiplets no good for gaming not yet anyway maybe we’ll start to see them with 6000/7000 series

  • @7lllll
    @7lllll ปีที่แล้ว

    i hope the latency issue won't get gaming gpus stuck in the mud with monolithic dies and performance stagnation

  • @tofu_golem
    @tofu_golem ปีที่แล้ว +1

    Who cares? Graphics cards are too expensive, and it looks like that state of affairs is permanent. So I genuinely don't care if AMD beats Nvidia or not. I don't even game much anymore because the whole industry is so depressing.

  • @nivea878
    @nivea878 ปีที่แล้ว +2

    dude what are you talking AMD is non existant in gpu market

    • @Patrick73787
      @Patrick73787 ปีที่แล้ว

      AMD has 17.5% market share in the DIY GPU space as of Q2 2023.

  • @sudheeraggarwal570
    @sudheeraggarwal570 ปีที่แล้ว

    Dojo is much better..... isn't it?

    • @niyazzmoithu20
      @niyazzmoithu20 ปีที่แล้ว

      Isn't it specifically built for AI training and stuffs?

  • @vensroofcat6415
    @vensroofcat6415 ปีที่แล้ว

    Funny how you consider chiplets being better than monolithic. It's against basic physics, but who cares. As for the future - AI will decide what's best for what.
    Monolithic die has performance and efficiency advantage. Chiplets have production costs advantage for large packages. While for small ones overhead could surpass savings.
    As long as Nvidia can balance production costs and selling price, they will stick to top performance solution, which is monolithic. They can do chiplets any time. Tools and production platforms are available to everyone in the industry. Next gen is probably the most intriguing one in decades. It's a design and production where AI will kick in for real.

    • @shanent5793
      @shanent5793 ปีที่แล้ว +2

      Nvidias own research says that chiplets outperform any possible monolithic design, it's not a matter of manufacturing margins at the high-end. Even just moving the memory controllers and cache off the die like RDNA3 is worthwile because it doesn't introduce extra intra chip latency and recovers that space for more compute logic. Nvidia isn't going to wait ten years for someone to develop a larger reticle just so that they can continue making monolithic designs

    • @vensroofcat6415
      @vensroofcat6415 ปีที่แล้ว +1

      @@shanent5793 Chiplets introduce extra distances and interfaces. They can't be faster because of that. Stop the bs.

    • @shanent5793
      @shanent5793 ปีที่แล้ว +1

      Lol a playstation must also be faster than a desktop PC because small 😂

    • @lemontree5986
      @lemontree5986 ปีที่แล้ว

      Monolithic are better in all possible case, chiplets just cheap to produce, ( even nvidia do chiplet they still goin to charge 2grand for gamer anyway ) and people would buy it.

    • @vensroofcat6415
      @vensroofcat6415 ปีที่แล้ว

      @@shanent5793 You are probably lost between the lines of some AMD presentation. Or compare apples to oranges.
      If you have similar logic chips and one is monolithic while other cut in chiplets, monolithic will always be better. Because physics. Also often more expensive.
      If you compare chiplets with RAM moved over there to a classical system with removable RAM further apart, the result will be different just like the system! But once you do the same with monolithic setup, monolithic die will win again. Because physics. Apples to apples, ok?

  • @WilliamTaylor-h4r
    @WilliamTaylor-h4r ปีที่แล้ว

    long as the money pig keeps shaking it;s rear end, ohh budy, now thats some serious generocity, just keep them on their bellies. My methodolidy would be to grab the sow and make photonic wells matrix convolution, then chipleps need pick and place hardware, so these are cell phone optical to electric converters. We can squaze the piggy bak, I mean if you have a very pyramidal investmement, ooh budy, oh yah.