Next-Gen CPUs/GPUs have a HUGE problem!

แชร์
ฝัง

ความคิดเห็น • 899

  • @flioink
    @flioink 2 ปีที่แล้ว +911

    Nowadays CPUs have more cache memory than my
    first PC had RAM.
    It's amazing how far we've come in terms of processing power.

    • @uvuvwevwevweossaswithglasses
      @uvuvwevwevweossaswithglasses 2 ปีที่แล้ว +16

      486 :D

    • @TremereTT
      @TremereTT 2 ปีที่แล้ว +17

      I think once we get a better process than calculating parts of the Program ahead of time in parallel in all possible outcomes and then throwing away all of the cached results but one because of a result ahead in the pipeline, we will get to need way less Cache.

    • @soylentgreenb
      @soylentgreenb 2 ปีที่แล้ว +8

      But the L1 is still small and shrinking it no longer makes it faster.

    • @bricaaron3978
      @bricaaron3978 2 ปีที่แล้ว +4

      @@uvuvwevwevweossaswithglasses How much RAM did a gaming 486 have, and about how much was a megabyte of RAM?

    • @kellyshea92
      @kellyshea92 2 ปีที่แล้ว +2

      I just built my first pc the other day and got it to post on the first try. It literally take 1 second for it to boot up. I didnt think the new i9 was so strong

  • @mkatakm
    @mkatakm 2 ปีที่แล้ว +488

    That's why AMD is starting to use 3D v-cache, which is basically stacking multiple cache RAM layers vertically in the same space. As it did with Ryzen 7 5800x3d, same technology is coming with 7000 series AMD CPUs as well soon.

    • @ClaimClam
      @ClaimClam 2 ปีที่แล้ว +28

      techno gobbledygook

    • @baldwindomestic2267
      @baldwindomestic2267 2 ปีที่แล้ว +95

      @@ClaimClam more cache, but stack like burger patty, more stack, more cache/burger

    • @ClaimClam
      @ClaimClam 2 ปีที่แล้ว +40

      @@baldwindomestic2267 understand

    • @robojimtv
      @robojimtv 2 ปีที่แล้ว +19

      Wouldn't be surprised if the GPUs get v cache one day too. I think it could solve a number of issues with the RDNA3 chips

    • @guytech7310
      @guytech7310 2 ปีที่แล้ว +19

      The issue is addressing heat when stacking dies vertically. I don't know how much heat SRAM produces, but I suspect it will be a problem. Maybe they can get by with a double stack, but I suspect any additional layers is not going to have the means to dissipate the heat.

  • @Chillst0rm
    @Chillst0rm 2 ปีที่แล้ว +399

    This is why MCM (( multi chip modules )) combined with 3d vcache will be soo important moving forward. L4 cache will probably make a return also, as something much farther from the die compared to L1 to L3

    • @GewelReal
      @GewelReal 2 ปีที่แล้ว +31

      if L4 will be able to work as RAM that would be a revolution. Few GB of L4 would make getting RAM for light use obsolete. And even with extra RAM it would be a massive performance benefit

    • @Runefrag
      @Runefrag 2 ปีที่แล้ว +37

      You say "Important" but there is legitimately no excuse for more powerful consumer hardware outside of extreme VR / 4K. Graphical fidelity has peaked years ago at the currently mainstream use of polygons. If anyone bought any sort of remotely mid-range computer within the last 1-2 years and they experience performance issue in games, it is 100% optimization / functionality related.

    • @Technicellie
      @Technicellie 2 ปีที่แล้ว +11

      @@Runefrag I agree with you from the sight we have now, but I wouldn't set it in stone just yet.
      I don't see what can be improved in graphic fidelity.
      But just because we don't see it, doesn't mean that there is none.

    • @dkis8730
      @dkis8730 2 ปีที่แล้ว +21

      @@Runefrag completely path traced ray traced games are the future though. And you need the most powerful hardware today to run 4k/144fps which gives you optimal smoothness with visibly much better graphics.

    • @ThylineTheGay
      @ThylineTheGay 2 ปีที่แล้ว +9

      @@Runefrag companies should definitely be aiming for efficiency, but they don't, and probably won't, because "this won't destroy the planet" doesn't market as well as "oooooh, shiiiiny"
      Classic capitalism 🙃

  • @damienlobb85
    @damienlobb85 2 ปีที่แล้ว +97

    AMD definitely doesn't get enough credit for their forward thinking in this regard. And as highly regarded Jim Keller and his work on Zen has been. There was an engineer (Sam Naffziger) who was responsible for persuading the senior execs to use chiplets on Zen and future AMD products.

    • @ledoynier3694
      @ledoynier3694 2 ปีที่แล้ว +4

      .. maybe because they did not invent the wheel? every foundry has MCM designs and chip stacking technologies being worked on since the past 10 - 15 years. We're only just starting to see them hit the market.

    • @BruceCarbonLakeriver
      @BruceCarbonLakeriver 2 ปีที่แล้ว +20

      @@ledoynier3694 and yet intel was talking about "we're not gluing our chips together..." (although they are doing it for Xeon for a while...)

    • @CommanderRiker0
      @CommanderRiker0 2 ปีที่แล้ว +5

      Didn't Intel do this long ago with "Crystal-Well" 128mb cache chip year and years ago?

    • @HighYield
      @HighYield  2 ปีที่แล้ว +6

      Broadwell i7-5775C

    • @1000area
      @1000area 2 ปีที่แล้ว +6

      @@HighYield but that's an L4 cache, a known solution to add cache next to the chip. not stacked cache like what AMD and TSMC are working right now.

  • @DigBipper188
    @DigBipper188 2 ปีที่แล้ว +83

    AMD had cache scaling down as one of a few reasons they decided to split dies. Cache and some interfaces such as the memory controller don't scale well on their chips when going down a node, which is why their later EPYC and R7000 parts have the IO and some cache levels split from the cores so that they can maintain the diminutive size of the actual cores themselves, and then anything that doesn't scale well (e.g memory controller, L3 cache and so forth) can be produced on another die at a cheaper, lower resolution process node (say, 5nm for the CCDs and then 16nm or even 20nm for the MCD / IO dies). This is also why the Memory Cache Die (MCD) of RDNA3 is a thing too, as it doesn't scale well on the current 5nm node, so AMD has opted to use a larger node for these parts to reduce cost, and then reserve the 5nm node for the GCD itself where they can still see density benefits from the increased resolution of that lithography node.

    • @Jaker788
      @Jaker788 2 ปีที่แล้ว +8

      Well, they don't quite wanna go so far back as 16-20nm, they've been progressing their non logic die, For Ryzen 7000 it's 6nm for IO (and IO + L3 for RDNA3) a high yielding, efficient, and cheaper than 7nm node that's basically refined and faster manufactured 7nm due to multiple layers using EUV. Seems like they'll stay there for a while on IO (and cache for RDNA) and keep logic shrinking to new cutting edge nodes.
      While density doesn't scale anymore with IO, and now memory, there is supply, tooling, and energy efficiency still that factor's in. 20nm planar silicon wouldn't be as efficient for L3 or IO.

    • @rocket2739
      @rocket2739 2 ปีที่แล้ว +6

      ''Reduce cost'' yeah, for them. Because on the consumer end, we have yet to see the prices go down...

    • @Jaker788
      @Jaker788 2 ปีที่แล้ว +7

      @@rocket2739 Technically we saw RX7000 prices drop a bit below the previous generation RX6000.
      But really, reduced cost means it won't increase as much as any competition that isn't doing the same thing. If this pays off for AMD, and Nvidia takes years to get their own implementation then they'll be at a cost disadvantage.

    • @josephsteffen2378
      @josephsteffen2378 2 ปีที่แล้ว +3

      @@Jaker788 Nvidia enjoyed its day in the sun. I remember when the Titanium Series(or whatever it was) was released... It was just by chance that I read an article (on some online computer magazine/media).... I recognized the jump in technology/speed/value... Nvidia shot ahead of the pack. Not by a few feet or seconds, more like they "lapped" the competition.... that stock just moved from $17/share to $20/share. Some how I got it all together and told everyone that I knew "BUY INVIDIA!". It just reached $27. I guessed that it could go, maybe, up to about $129. I figured that was as far as my skill could guess. I don't know jack about the stock market or trading... It was the only time in my life that I made a prediction of a stock profit... or suggested purchasing.... NAILED IT!

    • @peceed
      @peceed ปีที่แล้ว

      @@josephsteffen2378 The same with AMD. Unfortunately didn't have money for investment.

  • @coladict
    @coladict 2 ปีที่แล้ว +42

    Engineering is always a balancing act. Improving one aspect comes with drawbacks in another. There may be ways to mitigate those drawbacks, but eventually when using the same principle of a technology you will hit its physical boundaries.

  • @jabadahut50
    @jabadahut50 2 ปีที่แล้ว +22

    Magnetic resistance memory is nearly as fast as SRAM and there are methods out there for it to be used in an analog mode allowing a single cell to hold 8 bits per cell. Would be interesting to see in the future if this tech gets adopted.

    • @diegorosario2040
      @diegorosario2040 2 ปีที่แล้ว +1

      Wouldnt it requiere on chip error correction to be used to store 8 bits?

    • @jabadahut50
      @jabadahut50 ปีที่แล้ว

      @@diegorosario2040 depends on the design but it might. I'm not 100% sure how it works but to my understanding its a sort of a magnetic potentiometer with a sort of adc that is hardwired to the 256 possible outputs.

    • @diegorosario2040
      @diegorosario2040 ปีที่แล้ว

      @@jabadahut50 the deal with non binary encoding Is that it worsens singal to noise ratio. Error correcting code would be need to mitigate that problem

    • @jabadahut50
      @jabadahut50 ปีที่แล้ว

      @@diegorosario2040 likley and im sure that might trade some speed off but ecc memory is already usually denser and slower than non ecc memory anyway so I dont think it'd be a huge trade off for 8x capacity per chip

    • @diegorosario2040
      @diegorosario2040 ปีที่แล้ว

      @@jabadahut50 it Will work storage wise but i am curious if it could compromise bandwith

  • @marsovac
    @marsovac 2 ปีที่แล้ว +120

    Nice video! But you didn't explain what "SRAM scaling" means in this context and why is it happening. I guess it means that the size of an SRAM cell does not get smaller as the process node gets smaller. But considering that the same applies to some other parts of the chip like interconnects, this is nothing new.
    Currently TSMC 7nm or 5nm have almost the same feature sizes but the density is increased in smaller nodes. Logic circuits are not packed as close to each other as possible and this is where they get scaling.
    SRAM does not have the possibility to be denser, since it already is as dense as it can be in a perfect grid. At some point logic circuits in the chip will end up having the same problem.
    So the real problem is that the processes are getting less nm in their name while the transistor gate distance remains the same. They are decreasing numbers of the process but they are not nanometers anymore and this is what is causing SRAM problems. SOmething as dense as it gets has no benefit from increasing density, just from decreasing transistor size.
    Maybe you want to talk about this "cheating" that is occuring in the process names. The name of the process no longer correlates to the distance between transistor gates. Maybe a video about process shrinking and how it changed in the last 10 years would be informative.

    • @adityasalunkhe8156
      @adityasalunkhe8156 2 ปีที่แล้ว +16

      ^exactly he should have said SRAM chips stopped scaling in density rather than just scaling because also remember the register file and the microcode controller is also implemented as an SRAM in the execution pipeline and if there is more delay to access register file it would would mean less IPC and then why would you have faster ALU nodes paired with slower register file or microcode controller makes no sense

    • @larion2336
      @larion2336 2 ปีที่แล้ว +9

      Yeah idk that this is as significant as he makes it sound. The entire reason AMD are going with chiplet designs in RDNA 3 is because there are already things like IO, and memory to an extent already, don't scale as well with lower nm designs, so they make the core GPU chip lower nm and higher nm for other parts where downscaling it doesn't lead to any real performance benefits while saving them money. Well that and it means they can stitch chips together but yeah.

    • @dex6316
      @dex6316 2 ปีที่แล้ว +8

      This video mentioned that other components of a processor are also suffering from scaling issues. However, this is especially problematic for SRAM cells. SRAM not scaling means that to boost performance one must use more silicon. That’s very bad for the high performance microprocessor industry, which is the premise of this video. Other components not scaling well isn’t as impactful on the final designs because processors aren’t dependent on massive growth of these components; look at the cache growth to see why SRAM not scaling is really bad. Also logic cells don’t get denser by optimizing how they are packed together. The cells are reconstructed using different materials to hit desired performance targets at smaller sizes. Logic transistors are in fact getting smaller.

    • @kotekzot
      @kotekzot 2 ปีที่แล้ว +1

      If feature sizes remain almost the same, what is it about new processes that enables them to reduce wasted space to increase density?

    • @johndododoe1411
      @johndododoe1411 2 ปีที่แล้ว +1

      @@dex6316 How do material changes allow smaller logic gates without allowing smaller SRAM cells?

  • @6SoulHunter9
    @6SoulHunter9 2 ปีที่แล้ว +128

    The information quality of this channel is astounding, I cannot believe it has only 3.4k subscribers.
    Also, presentation quality is also very good and it's improving :)

    • @marsovac
      @marsovac 2 ปีที่แล้ว +7

      you would be astonished by how many US people will not watch these videos simply because of the accent. I've seen people that don't want to watch videos done by Aussie or British english creators because of the accent, and those are much closer to american english.

    • @6SoulHunter9
      @6SoulHunter9 2 ปีที่แล้ว +4

      @@marsovac I know. The accent was always right for me, but after watching some harsh criticism I have started to pay attention and I think that this channel is improving on that regard, the accent was thicker.
      And while I don't mind the accent, I know that there are some channels which sometimes I watch without being very interested, because the voice is smooth and mesmerizing. I am sure it would help this channel to take off.
      Me? I don't mind, my english accent isn't the best either.

    • @RM-el3gw
      @RM-el3gw 2 ปีที่แล้ว

      yes, it's crazy underrated. The youtuube algorithm is the one that sometimes fails to bring quality content like this to the front where it belongs.

    • @padnomnidprenon9672
      @padnomnidprenon9672 2 ปีที่แล้ว

      Loo yes. I just realized he have 4k subs. I thought it was 90k at least

    • @stevewiley3832
      @stevewiley3832 2 ปีที่แล้ว

      For me it is the usage of sensationalist wording. He used the words "...approaching death", which implies that SRAM has a functionality problem even though the issue is a scaling problem.

  • @miweneia
    @miweneia 2 ปีที่แล้ว +25

    This channel is criminally underrated, presenting so much data and such key points in such a digestible and short manner is commendable!
    That aside, it’s actually crazy to think about how humanity has existed for thousands of years, but in only the past 50 years we’ve went from creating the first CPU to hitting the actual physics limitations of it’s cache module, and in other 15 or so years we’ll probably hit the physics limitations of the actual CPU’s transistor size. Really makes me wonder what technology and chips would look like 50 years from now… Hopefully I’ll find out firsthand!

  • @aylim3088
    @aylim3088 2 ปีที่แล้ว +41

    I'd really wanna see what a more mature chiplet GPU with 3d cache could do. Bit of a shame that rx 7900 was a bad launch but definitely hopeful for the future; besides, I would have been suspicious if the first-ever chiplet GPU didn't launch with teething problems. Shame its issues can't really be called 'just' teething problems, but I'll keep on the waiting game.

    • @TheCustomFHD
      @TheCustomFHD 2 ปีที่แล้ว +3

      It seems the AMD GPUs are relatively easy to reduce the Hotspot Temp. Vertically mounting it seems to fix it, and also more thermal paste. Look at Der8auer's video

    • @JJAB91
      @JJAB91 2 ปีที่แล้ว +2

      The hotspot issue only seems to effect AMD's own cards, partner cards don't have such issues.

  • @pacifi5t
    @pacifi5t 2 ปีที่แล้ว +4

    Thank you for breaking down this issue. I thought I knew a lot about hardware, but it seems I've only seen the top of this iceberg.

  • @K11...
    @K11... 2 ปีที่แล้ว +173

    Your channel will grow through the roof soon. You have amazing content.

    • @Col_Panic
      @Col_Panic 2 ปีที่แล้ว +9

      I know, it's great to see so many people interacting and "liking" so fast. The number has grown steadily for some time now, which is great! He deserves it for sure!

    • @HighYield
      @HighYield  2 ปีที่แล้ว +27

      It also makes the whole video creation process a lot more fun if I know ppl are actually gonna watch it!

    • @nutzeeer
      @nutzeeer 2 ปีที่แล้ว +2

      Just got a front page recommendation and i will sub

    • @nutzeeer
      @nutzeeer 2 ปีที่แล้ว +1

      3841th sub :)

    • @Hunter_Bidens_Crackpipe_
      @Hunter_Bidens_Crackpipe_ 2 ปีที่แล้ว +2

      Nah

  • @tqrules01
    @tqrules01 2 ปีที่แล้ว +47

    I don't think it will be an issue for AMD. They are using 3D caching. The 5800X3D is stil a beast. Oh nvm you already mentioned it. I think in the future they will be able to start stacking with a faster and faster interconnects i.e next gen infinity Fabric

    • @Yuriel1981
      @Yuriel1981 2 ปีที่แล้ว +7

      Was going to say pretty much the same thing. 3D cache will increase the amount of SRAM that a chip will be able to hold. It doesn't fix the scaling problem. But it does solve some of the size issues which is why the AMD Chiplet tech is most likely the next step.

    • @kotekzot
      @kotekzot 2 ปีที่แล้ว +1

      Pretty sure Infinity Fabric is slower than the vias used in 3D V-cache.

    • @daxconnell7661
      @daxconnell7661 2 ปีที่แล้ว +1

      even when early computers where developed some discovered you could double the amount of memory in a computer by stacking ram. 4464 RAM Chip commodore 64/Apple era

    • @spamcheck9431
      @spamcheck9431 2 ปีที่แล้ว +2

      THIS right here.
      I think AMD and Nvidia are going to separate here in terms of utility.
      Nvidia is gonna have to focus on cuda cores, while AMD focuses on parallel processing.
      The only thing that might save Intel is if they somehow went along with apple’s chip methodology, where they target specific use cases, such has a portion of the CPU hard wired for specific tasks instead of relying on transistor gates.

    • @kotekzot
      @kotekzot 2 ปีที่แล้ว +1

      @@spamcheck9431 would you explain what hardware features Apple integrates that Intel doesn't? AFAIK Intel and AMD include a lot of extra instruction sets and some accelerators (e.g. for encryption).

  • @bananaboy482
    @bananaboy482 2 ปีที่แล้ว +54

    The amount of attention this video has is criminal. Best video I've watched all day! Entertaining, informative in an easy to understand way, and well made!

  • @b130610
    @b130610 2 ปีที่แล้ว +29

    AMD certainly seems to have an advantage in the chiplet space because of their past successes with zen, but I have to wonder how much longer that advantage will last. It would be pretty ironic if nvidia integrates chiplets into their cards before AMD can leverage that advantage for a clear win at the high end. It seems like they really had a golden opportunity with rdna3, but it obviously hasn't really worked out that well so far.

    • @ag687
      @ag687 2 ปีที่แล้ว +6

      it's not a chiplet, but Nvidia is already leveraging entire datacenters of cards to work together as though its one supersized GPU. Which means they probably already have the tech they need need to do chiplets without too much of an issue.

    • @b130610
      @b130610 2 ปีที่แล้ว +5

      @@ag687 afaik, the chiplet tech AMD is using is at least a couple orders of magnitude higher bandwidth than nvidias data center networking solutions (although, they are impressive in their own right). The chiplet interconnects are developed in coordination with tsmc though, so it's not inconceivable that Nvidia could use similar tech to AMD as long as they stay in good graces with TSMC.

    • @sudeshryan8707
      @sudeshryan8707 2 ปีที่แล้ว +2

      i think Amd has patented most practical aproaches to chiplet design already which will leave others very much little space for innovation. Intel's struggling for years with their tile design is showing its much harder for others to be competitive.

    • @b130610
      @b130610 2 ปีที่แล้ว +2

      @@sudeshryan8707 I'm inclined to agree with you there, but I'm not ready to rule out something new built on TSMCs packaging technologies for high speed interconnects. Last year I thought no other chip design firms were even close to AMD on mass market chiplet designs, but then we saw the m1 ultra from apple with very impressive performance scaling over a whole new fabric. I wouldn't count Nvidia out, but I'm certainly not expert on the matter, just an armchair critic.

    • @aravindpallippara1577
      @aravindpallippara1577 2 ปีที่แล้ว +4

      ​@@b130610while apple's m1 ultra is very impressive it has less bandwidth per silicon usage and the interposer itself is an extremely expensive tech compared to amd's infinity fabric based inter die communication
      Amd might go patent troll on other companies going forward, not a fan of that happening

  • @JosephArata
    @JosephArata 2 ปีที่แล้ว +5

    Die stacking will get rid of this problem, they can use a larger process node with the SRAM, while the GPU/CPU cores are using the lowest node possible. They'll also likely start using HBM once they go full PC on a single chip design.

  • @towb0at
    @towb0at 2 ปีที่แล้ว +10

    Super interesting topic. Seems like the one that comes up with the best successor to SRAM will take the cake, once chiplets scaling is fully utilized

  • @dascandy
    @dascandy 2 ปีที่แล้ว +4

    This finally explains why the CPU core is made on a smaller process than the memory chips, when it used to be that memory chips were the first to shrink (because of much simpler design).

    • @alwanexus
      @alwanexus ปีที่แล้ว

      You may be thinking of DRAM, which requires different process features.

    • @JoeLion55
      @JoeLion55 ปีที่แล้ว

      DRAM has always been on an older process than logic, because 1) DRAM cost control is much more critical than logic and can’t afford to use bleeding edge fab processes, and 2) the DRAM array has features (like Wordlines and bitlines) that use entirely different fan processes and aren’t able to scale at the same rate as logic transistor processes.
      But, historically, SRAM was used as the test vehicle to test new processes, because SRAM uses (or can use) “normal” logic transistors.

  • @kiri101
    @kiri101 2 ปีที่แล้ว +20

    I already knew about the topic but this was such a well organised video it was still worth watching. Your pacing, delivery of speech and the information density in the video are very well balanced. Thank you.

  • @scaryhobbit211
    @scaryhobbit211 2 ปีที่แล้ว +18

    Eh... they'll find a way around the SRAM bottleneck, like they always do.
    There's the Chiplet designs like you mentioned, but I'm also interested to see what IBM's light-based CPU leads to.

    • @soylentgreenb
      @soylentgreenb 2 ปีที่แล้ว +10

      Single core scaling ended when dennard scaling died. Multicore scaling isn’t really working that well as real time consumer applications like games cannot take good advantage of it without increasing latency (hemce why 144 FPS today doesn’t feel better than 72 FPS in the 90’s; more pipelined engine). Moore’s law scaling is not holding up that well either; it is about cost per transistor, but wafer prices are almost competing with density scaling.
      Light is a piss poor medium for density of storage and density of logic. Light is very large. A blue photon is 350 nm big and when you approach that sort of scale you get weird effects like surface plasmon resonance and quantum tunneling. So you either incorporate the weirdness and do something with plasmons or you make a bus with micron sized wave guides; a lithography size that hasn’t been in vogue since the 1980’s

    • @amineabdz
      @amineabdz 2 ปีที่แล้ว +1

      @@soylentgreenb So the absolute best photonics can do is non ionizing radiation ? which is very near ultra violet range, either that or find some way to mitigate the material degradation from using some ionizing wavelength (which afaik is impossible, or else even nuclear shielding on Nuclear power plants would not be of a concern anymore)

    • @davidmckean955
      @davidmckean955 2 ปีที่แล้ว +2

      Considering we're quickly reaching the physical limits of what's possible for scaling all parts of the CPU, we have much bigger problems to worry about medium term.

    • @amentco8445
      @amentco8445 2 ปีที่แล้ว

      @@soylentgreenb And what would be the big issue in utilizing UV for this?

  • @youcrew
    @youcrew 2 ปีที่แล้ว +10

    I think this is why chiplette/tile designs are essential. We will start seeing SOC packaging get larger

    • @BruceCarbonLakeriver
      @BruceCarbonLakeriver 2 ปีที่แล้ว

      It is a matter of time when the whole Van Neuman architecture is within a chiplette design. The motherboard just will hold RAM and peripherals connected to the SoC.

  • @anepicotter4595
    @anepicotter4595 2 ปีที่แล้ว +6

    Fortunately we can get a lot more SRAM with AMDs 3D cache method and it’ll definitely work well in chiplet designs even as the core chiplets continue to scale down.

  • @zonemyparkour
    @zonemyparkour 2 ปีที่แล้ว +1

    When your channel becomes famous, I want to leave this here as proof I was here from the beginning.
    Great content. Loved your graphic explanations.

  • @mnomadvfx
    @mnomadvfx 2 ปีที่แล้ว +14

    This has been known for a while and ARM have been looking to using some variant of MRAM to replace SRAM for the purpose of CPU caches.
    While this is difficult in a monolithic die it becomes easier with chiplet stacking as AMD have already demonstrated with X3D.
    Not only will MRAM offer non volatility/persistance for potentially higher power efficiency, but it will also offer dramatically superior area scaling to SRAM for larger caches.

  • @BlenderRookie
    @BlenderRookie 2 ปีที่แล้ว +1

    Bigger dies are inevitable, along with wider memory busses. Transistors and d latches(or whatever they are called these days), can only get so small and transistors can only switch so fast. The eventual step is wider word processing and wider memory word accessing. But hey, I am old and when I was into the nitty gritty of this stuff, CPUs were running typical TTL voltages of about 5 volts. So yeah, I'm expired.

  • @TheDoomerBlox
    @TheDoomerBlox 2 ปีที่แล้ว +4

    7:14 - Probably was worth noting that '6nm', in spite of being "adjacent" to '5nm' in its name, is actually a refined-refined version of the older n7 TSMC node seen on Zen2 chiplets.

    • @HighYield
      @HighYield  2 ปีที่แล้ว +1

      You are correct, 6nm is based in 7nm just like 4nm is based on 5nm.

  • @davidgunther8428
    @davidgunther8428 2 ปีที่แล้ว +7

    I think 2.5D chiplets will stay at the L3 cache level, not the L2 level. There's so much data transfer and the latency needs to be so low that L2 on a chiplet would need to be closer/ stacked to perform well.

  • @johnsavard7583
    @johnsavard7583 2 ปีที่แล้ว +9

    At about 5:55 in your video, you finally mentioned chiplet design - if you can't scale static RAM, just put it off the chip. Of course, that involves some additional delays, so you still need L1 cache on the die with the logic, but it helps a lot.

    • @xeridea
      @xeridea 2 ปีที่แล้ว +1

      Yeah L1 and L2 probably still best on the same chip since latency is critical, but L3 is a great candidate.

    • @Tigerfox_
      @Tigerfox_ 2 ปีที่แล้ว +3

      I feel like we're back in days of Pentium II and III.

    • @BenjaminCronce
      @BenjaminCronce 2 ปีที่แล้ว

      @@Tigerfox_ Except for many work loads, the P2/P3 with smaller on-chip L2 cache was faster than the larger off-chip cache. Celeron with 128KiB of on-chip L2 cache was faster than the 512KiB off-chip cache Pentium. In this case, I think the off-chip ran at half frequency. Much faster than DRAM, but a few factors lower bandwidth and higher latency than on-chip. Going off of memory from 2 decades ago. Take it with a grain of salt.

    • @Tigerfox_
      @Tigerfox_ 2 ปีที่แล้ว

      @@BenjaminCronce I know all that, but I don't understand what you're trying to say. Of course, for some workloads more cache is better than faster cacher, for some it's the other way around. I haven't seen an in dept analysis of what applications profit more from Raptor Lakes increased cache yet, but I know that for example only some games profit greatly from 5800X3D's 3D-cache, same as some games run faster on Broadwell i7-5775C wirth eDRAM L4-cache than on on i7-7700K.
      They'll have to find a compromise. AMD reduced the size of infinity cache slightly on RDNA3, but vastly increased it's speed.

  • @omegaprime223
    @omegaprime223 2 ปีที่แล้ว +12

    My only thought is: "Oh no, application developers will have to learn how to optimize again... the horror."
    Companies have been offloading optimization work because technology could just brute-force things for so long, now that we're starting to see limitations that might stick around for more than once chip generation corporations will have to optimize existing features if they want to cram even more features in.

    • @zthemythz
      @zthemythz 2 ปีที่แล้ว

      were probably just going to see stagnation

    • @macicoinc9363
      @macicoinc9363 3 หลายเดือนก่อน

      I’m looking forward to this happening. Despite what people assume, I think it will be a net gain to everyone.

  • @samghost13
    @samghost13 ปีที่แล้ว

    There was a Big Light switching ON in my Head. Thank you very much Sir!

  • @jazzochannel
    @jazzochannel 2 ปีที่แล้ว

    5:40 "isn't there anything that can be done? great question, so glad you asked" smoothest transition of the year.

  • @ytviewer267
    @ytviewer267 2 ปีที่แล้ว +2

    Apple already has a CPU using chiplet tech. The M1 Ultra CPU introduced back in March which stitches together two M1 Max chips into a single package. They aren't currently using it to split off SRAM, but the M1 Max is an extremely large die comparatively.

    • @HighYield
      @HighYield  2 ปีที่แล้ว +1

      Thats true, but since its "just" two of the same M1 Max fused together, I am separating it from chiplet designs like AMD is using, with chiplets of different sizes.

  • @SpencerHHO
    @SpencerHHO 2 ปีที่แล้ว +11

    I thought scaling had pretty much died around the 28nm nodes. It seems AMD has already solved this issue with chiplets and 3DVcache all the L3 cache on RDNA3 variants released so far by amd have the cache and memory controllers(which also don't scale much anymore) on separate chiplets on a cheaper older node than the main compute die. we will see larger packages with AMD and costs will continue to rise but their chiplet designs gives then Aussie advantage and Intel is already trying to implement their own version. A lot of the tech AMD uses is co developed with TSMC and isn't that different from the tech apple is using with its M2 chips I suspect this will only accelerate the transition to multi die SOCs and 3D stacking. Cache is a lot less energy hungry than logic so it makes sense that this what's seeing 3d stacked silicon first.

  • @runeoveras3966
    @runeoveras3966 2 ปีที่แล้ว

    Great video! Thank you. Hope you enjoy the holidays.

  • @SupraSav
    @SupraSav 2 ปีที่แล้ว

    Solid video. Hope your channel blows up brotha

  • @horusfalcon
    @horusfalcon 2 ปีที่แล้ว +2

    An interesting presentation! I wondered when something like this would happen. Now, whoever develops a more scaleable SRAM will wind up being the performance leader unless other techniques prove much more cost-effective.

  • @joehorecny7835
    @joehorecny7835 2 ปีที่แล้ว +9

    Amazing Content and Analysis! Hopefully they are working on the bandwidth of the chiplets, sounds like that might be the next bottleneck.

  • @Eskoxo
    @Eskoxo 2 ปีที่แล้ว +3

    I Think this could probably have many possible solutions how IBM Telum cpu handles different caches in cluster of cpus comes to mind or perhaps have different chip with slightly slower L4 cache etc

  • @MarianRambo1
    @MarianRambo1 2 ปีที่แล้ว +2

    4:10 You forgot to mention about ryzen 5800X3D witch has 96 MB l3 cache.

  • @electronash
    @electronash 2 ปีที่แล้ว +1

    This is weird. I just bought a Ryzen 9 5900X, to upgrade a 3200G in my second PC.
    When I was comparing it to chips like the 5800X3D, I noticed the different in L3 cache sizes, and wondered how much area cache must be taking up on the chip.
    I figured that a BIG part of the cost of the chip is the cache, since even 32MB will take up quite a large area of the silicon.
    I didn't realize there was a problem with SRAM cell size on the smaller nodes, though. Interesting vid.
    If only SRAM was somehow smaller and simpler to produce, we would likely never have needed to use DRAM at all.
    I've often wondered how fast a PC would be if it's main RAM could use SRAM instead of DRAM.
    (Modern DDR SDRAM is FAST, but the latency is still high compared to what I would think SRAM could do.)

  • @46three
    @46three 2 ปีที่แล้ว +2

    Gamers Nexus has an interview with one of AMD's lead engineers, Sam Naffziger, who explains this exact issue as one of the key concerns that chiplet design (and 3d V-cache) aims to mitigate. Interesting chat for sure.

    • @46three
      @46three 2 ปีที่แล้ว

      th-cam.com/video/8XBFpjM6EIY/w-d-xo.html

  • @HazzyDevil
    @HazzyDevil 2 ปีที่แล้ว

    Love the way you present these videos, about time I subscribe :)

  • @RM-el3gw
    @RM-el3gw 2 ปีที่แล้ว +4

    very informative as always. I believe theres multiple physics aspects of semiconductor tech that are being pushed to their limits rn. cheers

  • @Raven-lg7td
    @Raven-lg7td 2 ปีที่แล้ว +1

    omg i never heard about this before and I subbed to MLID, AdoredTV, Coreteks....you're a real hidden gem plz keep up! this is so interesting

  • @nezbrun872
    @nezbrun872 2 ปีที่แล้ว +1

    Good video, but I would have liked to have the "physics problem" explained, and why it specifically affects SRAM cache. I can understand the analog limitation, as "lumped" parts like resistors, capacitors and inductors need chip area, but why is SRAM cache special? It's digital logic, just like the CPU? An SRAM cache bit is typically six or eight transistors: what is the "physics problem" why can this not scale?

    • @HighYield
      @HighYield  2 ปีที่แล้ว

      If you are looking for more in-depth infos, heres an article from 2015 that clearly talks about the problems with SRAM scaling in detail: semiengineering.com/moore-memory-problems/

  • @paulsim7589
    @paulsim7589 2 ปีที่แล้ว

    I knew this from other hardware videos. But i watched this anyway as its quite relaxing and easy to kisten to. Your format for explanation is very good. Thank you.

  • @growthmonger4341
    @growthmonger4341 2 ปีที่แล้ว

    Great information and no BS, will definitely drop by again.

  • @7rich79
    @7rich79 2 ปีที่แล้ว +2

    One of the typical advantages of process node shrinks that is advertised is increased performance, increased power efficiency, or a combination of both. Does this mean then that if you cannot continue to shrink the process, SRAM performance will be the bottleneck for newer architectures? What are the alternatives to SRAM?

  • @rahcxyoutube
    @rahcxyoutube 2 ปีที่แล้ว +1

    I absolutely love your videos, keep it up!

  • @shyamdevadas6099
    @shyamdevadas6099 2 ปีที่แล้ว

    Very fascinating video. Well done!

  • @TheEVEInspiration
    @TheEVEInspiration 2 ปีที่แล้ว +5

    I think some caches will become near obsolete to make room for the more essential caches.
    Think of the separate cache for code that is indirectly fed from a data-cache.
    By changing them to just storing pre-decoded meta-data (like instruction boundaries on x64, or other decoding hints) and fetching the actual code from the data-cache instead when needed.
    There are more such tradeoffs to make for sure, like cache-complexity versus cache size.
    If cache size is under pressure by this scaling development, expect more complex/smarter caching systems that until now did not make economic sense.

    • @stevetodd7383
      @stevetodd7383 2 ปีที่แล้ว +2

      There’s a very good reason for split I and D caches - they allow simultaneous fetching of instructions and data. A pure Von-Neumann design (shared instruction and data memory) can only execute one instruction every other clock cycle (one instruction fetch followed by a data access relating to that instruction). Modern cores are all modified Harvard designs, that allow simultaneous fetching of instructions and data access via the two different caches. They are also quite small compared to later caches in the scheme, so unifying them will save little space.
      The better solution to the problem is 3D stacking and using simpler/cheaper process nodes to create cache layers. This actually gets the cache closer to the point of use while letting you increase sizes.

    • @TheEVEInspiration
      @TheEVEInspiration 2 ปีที่แล้ว

      @@stevetodd7383 I understand those points and I think it's an argument that has been loosing validity for some time now.
      Ever since the introduction of the level-0 uOp cache, the effect of large level-1 instruction caches has been going down. And those level-0 caches are getting bigger every generation!
      There is a saving to be had there for sure. By making them smaller, but smarter. For example by increasing set associativity or as I suggested by storing only meta-info/tagging relevant cache-lines in L2 as being used for code.
      As both L1 caches are fed from L2, there already is concurrent fetch capability at that level. I1 Cache is virtually all about lowering latency for non-decoded instructions! A smaller cache that speeds up the decoding would give the same benefit as todays caches. Putting some of the cache area towards a bigger uOp cache will see more benefit I think (at least that is the trend right now).
      As for Die stacking, that is all about level 2, not level 1 caches. This also speaks in favor of the idea of a smaller L1 instruction cache as the code will be in that extra large L2 anyway.
      Level 1 instruction is simply between a rock and a hard place (the much faster already decoded uOp cache and the much larger and extendable L2 I+D cache).
      And there is another trend looming, sharing massive level 2 caches between cores! That can be a huge transistor count saving architecture feature.

    • @stevetodd7383
      @stevetodd7383 2 ปีที่แล้ว

      @@TheEVEInspiration a cache only accesses the next higher level in the case of a miss. At this point there is typically a burst of activity while a cache line is written or read. Because of this I and D caches don’t typically access the L2 concurrently. Each level of cache has a progressively higher miss cost, and then adding multi-port access adds more. The I and D caches are deliberately small and fast. L2 is larger and slower, L3 larger and slower again. The job of the I cache is to keep the instruction decode pipe fed as much as possible. That pipe results in L0 uOps, but there’s a higher penalty if L0 misses and you have to go all the way to L2. The job of the D cache is to keep the data needs of the uOps fed as much as possible while avoiding the need to go to L2 again.
      There’s a reason that we don’t just have a single layer of cache. Big and complicated caches are slow. Cache models are a trade-off between the need to maximise hits and the time to return cached data.
      Oh, and to add to that, L0 cache is in the form of VLIW instructions that are far from compact. You’ll not get efficient use of space if you try for a large boost to the L0 to make up for no I cache.

  • @frankg7786
    @frankg7786 2 ปีที่แล้ว

    This was very interesting and well explained, thank you!!

  • @IgoByaGo
    @IgoByaGo 2 ปีที่แล้ว

    I have no idea why I have never seen your channel, but I totally subscribed. Great content.

  • @sharktooh76
    @sharktooh76 2 ปีที่แล้ว +3

    nvidia 4000 series is made on 5nm not on 4.
    4N is Nvidia customized node based on TSMC N5 5nm node.
    TSMC N4 is 4nm .
    4N is *NOT* N4.

  • @tjtjmich16p
    @tjtjmich16p 2 ปีที่แล้ว

    Dude your channel will explode with subscribers and viewers it's already happening now many recommendations from your channel is what TH-cam's algorithm is showing me and many more tech nerds out there so expect huge growth and you will reach 100 thousand subs before you know it,
    And awesome content by the way,
    Really well edited and well thought out videos,
    And I really like your accent it makes you sound like a tech company owner.

    • @HighYield
      @HighYield  2 ปีที่แล้ว

      It’s a bit overwhelming right now to be honest, but I’ll manage. Thanks for the kind words!

  • @gstormcz
    @gstormcz 2 ปีที่แล้ว

    Skull is lovely. Content is great. Narration no waste of time.
    Merry Christmas.

  • @kenohara4574
    @kenohara4574 2 ปีที่แล้ว

    This channel has 5.16k subscribers in Dec 27 , iam writting this cuz so this will be the proof that how good and informative this channel is and how fast it will grow , this channel will hit 1 million in a year mark my words :)

  • @Kevin-jb2pv
    @Kevin-jb2pv 2 ปีที่แล้ว +2

    Unless we have some sort of new paradigm shift in computer hardware, these limitations are why I think we're probably going to head into an era of off-loading CPU functions to dedicated co-processors. We already did it with GPU's, and bitmining did prove that certain functions are better handled by dedicated hardware and can be done cost-effectively. Plus, NVidia has been selling dedicated, specialized GPU hardware for AI for years, now. I think we're going to start seeing more processing handled by specialized units as demands grow. Exactly which functions? I can't say. For gaming, physics is the first thing that comes to mind, but PhysX was already a thing that failed and then got absorbed back into GPU hardware. Perhaps we'll see a return of discrete physics units? We also have dedicated AI chips out there, and I believe one of the things they get used extensively for is in processing image data in some phones. It was heavily marketed a few years ago by several major players, but I don't know if that's a thing that's still being done on current-gen phones.
    Point is, manufacturers have already done it and are at least trying to find other applications to offload to dedicated silicon. So far, the physical limits of semiconductors have not, yet, hit that brick wall that we've been getting warned about for years. It's slowed, but so far manufacturers have been able to use other tricks to get generational improvements in computing power, so the wider industry and enthusiast community hasn't had to feel the pain quite yet. Who knows, maybe manufacturers will be able to keep squeezing more cycles out of what we have right now for many years just because they will actually have to start doing real work on architecture re-working now that they can't just fall back on shrinking their transistors (and this, for the most part, is what we have been seeing, it's just a matter of how long they can keep doing that).
    But I think that when backs are really pushed up against the wall, we'll start seeing more radical solutions start being brought to market. I think that the fact that Moore's law is just about done with will likely mean that we're about to see a _boom_ in innovative and creative new solutions because the "safe" path is no longer a viable one and corporate leadership will start being forced to try new things to stay competitive.

  • @kotekzot
    @kotekzot 2 ปีที่แล้ว +1

    I wonder if Zen 5 is going to have any L2/3 cache on the die or are they going to stack it all on top of the die.

  • @jabezhane
    @jabezhane 2 ปีที่แล้ว

    I remember back in the mid 90's "the issue with going lower than XXnm" and then in the early 2000's the near impossible task of going past XXnm"...and so on. We keep going somehow.

  • @SB-pf5rc
    @SB-pf5rc 2 ปีที่แล้ว +1

    as someone who follows computer channels and bike channels, the thumnail for this video was very alarming.
    SRAM is like the biggest brand in the mountain bike space.

    • @HighYield
      @HighYield  2 ปีที่แล้ว +1

      Oh, didn’t know that. Hope there are no “clickbait” views who are mad when I don’t talk about bikes 😬

    • @SB-pf5rc
      @SB-pf5rc 2 ปีที่แล้ว

      @@HighYield no problem friend! i thought it was funny once i realized. 'sram' is a weird combination of letters, what are the odds?
      discovered your channel recently and love what you're doing. thank you.

  • @vinylSummer
    @vinylSummer 2 ปีที่แล้ว

    Awesome video! Subbed, going to watch your other videos

  • @NoneofyourBusiness-ii1ps
    @NoneofyourBusiness-ii1ps 2 ปีที่แล้ว

    well, there is also a physical limit of how dense you can store information, which happens to be the number of bits by counting the number of Planck squares on the surface of a black hole. Basically if you pack too much information into a given space it will collapse into a blackhole, literally...

  • @chibby0ne
    @chibby0ne 8 หลายเดือนก่อน

    This answers why chiplet design is becoming so popular lately. Thanks a lot for the well conveyed and duly researched video.

  • @DivusMagus
    @DivusMagus 2 ปีที่แล้ว +3

    This could mean AMD will have a big advantage as they have already done a lot of the research on chiplet designing and manufacturing. So they are already ahead of the curve. but with both intel's and Nvidia's insanely deep pockets they can just throw a ton of money at the problem and get it done quickly.

    • @necromax13
      @necromax13 2 ปีที่แล้ว

      Amd's chiplet design is codeveloped with TSMC, so anyone that has their silicon produced by tsmc will directly and indirectly benefit...

  • @yujaeha
    @yujaeha ปีที่แล้ว

    Amazing info. Thanks 🙏

  • @lou7139
    @lou7139 2 ปีที่แล้ว

    Chiplet future makes sense. The tiled and stacked package design on Meteor Lake is interesting but looks complicated to manufacture. Gone are the days of the simple-to-build and test monolithic die...so nostalgic.

  • @tomtomkowski7653
    @tomtomkowski7653 2 ปีที่แล้ว +9

    Let's wait and see how well this 1nm non-silicon process TSMC and MIT are working on will perform.
    And yes, chiplets is the way to go and the question is how well different companies will develop this idea with their different approaches.

  • @zxuiji
    @zxuiji 2 ปีที่แล้ว

    Well the can potentially create ERAM, using electrowetting and light rays it is possible to create a fast RW byte with minimal power usage, using just the position of light caught one can determine 0 or 1, could also try storing an entire unsigned integer/float with the strength of light caught

  • @Razor2048
    @Razor2048 2 ปีที่แล้ว +1

    What are your thoughts on CPU makers moving to add HBM to the CPUs, where it effectively becomes a massive level 4 type cache?

    • @tyaty
      @tyaty 2 ปีที่แล้ว

      Intel is already planning launch them in the near future . (Xeon Max)

  • @cyber_robot889
    @cyber_robot889 2 ปีที่แล้ว +1

    Wow, I'm in PC hardware like almost from 2003 year, and never ever heard about SRAM. thank you for new a and interesting information. Like a real reveal under my nose, lmao

  • @deusexaethera
    @deusexaethera 2 ปีที่แล้ว +2

    SRAM not being able to scale-down anymore doesn't mean it's dead, it means it's fully optimized. Those are almost, but not quite, exact opposites.

  • @ricardorapture
    @ricardorapture 2 ปีที่แล้ว

    finally a well explained tech channel

  • @nagi603
    @nagi603 2 ปีที่แล้ว

    Wonder how difficult it would be to have the 3D V-cache but with different node sources.

    • @HighYield
      @HighYield  2 ปีที่แล้ว

      Actually, thats already being worked on by TSMC and AMD. I'm not sure if the 3D V-Cache on the new Zen 4 X3D CPUs is also produced in 5nm, it could be a 6/7nm node. I'm trying to figure that our right now.

  • @635574
    @635574 2 ปีที่แล้ว +1

    Maybe even more impressive are neuromorpic chips where the compute and the memory are in the same place on the chip, and they are processing asynchronously.

  • @NTeKLullaby
    @NTeKLullaby 2 ปีที่แล้ว

    Great and concise video. Thanks.

  • @Analisede_Tudo
    @Analisede_Tudo ปีที่แล้ว

    We are researching this, with storage class memory, new emerging memories. To replace sram, maybe sot-mram is a solution

    • @HighYield
      @HighYield  ปีที่แล้ว +1

      Super interesting stuff. If you invent the future, please tell me!

    • @Analisede_Tudo
      @Analisede_Tudo ปีที่แล้ว

      @@HighYield I am more in the area of survey , like systematic review. These memory already exists, we are more discussing how viable they are , how to use, and the best emerging memories. For exemple Rimac cars already uses MRAM

  • @pirojfmifhghek566
    @pirojfmifhghek566 2 ปีที่แล้ว +2

    I'm actually looking forward to the day when we finally reach the limits of what a manufacturing node can accomplish in terms of nanometer node size. At that point it the costs and methods for making the most cutting-edge chips will really start to proliferate. The only thing that can improve a chip from that point forward will be the architecture of the silicon itself, which is where we sorely need the most improvements. It'll also be a good time for Windows and chip designers to come together and finally pare down the old x86 operating system to a _standardized_ reduced instruction set format. We are also going to need more purpose-built components in our computers soon, to give our computers more integrated utility rather than speed. I'm most interested in what companies like Mythic have been doing with analog chips, because they've managed to use older process nodes to create insanely efficient chips that do very complex tasks in AI computing. We've been leaning too heavily on CPUs and GPUs to do these compute tasks and a lot of that work could be offloaded to newer, purpose-built components.

    • @ilyarepin7750
      @ilyarepin7750 2 ปีที่แล้ว

      or they could stop wasting money on diminishing returns from miniaturizing silicon when its already close to its theoretical limits, and instead just work on commercializing a new approach to computing like photonic chips or carbon based chips.

    • @pirojfmifhghek566
      @pirojfmifhghek566 2 ปีที่แล้ว

      @@ilyarepin7750 This would be a welcome change. I dunno how far along in research we are with those technologies though. It may be that we hit the limits of silicon miniaturization long before carbon or photonic chips make their way into consumer devices. I just hope that hitting the manufacturing wall means cheaper silicon for a while.

  • @RayanMADAO
    @RayanMADAO 2 ปีที่แล้ว

    How much slower is having the cache on a different die than having it all on one die

    • @HighYield
      @HighYield  2 ปีที่แล้ว +4

      In the case of Zen 3D it isnt really any slower at all. If its done right, especially with 3D stacking, there is no performance loss.

  • @sumeetwadile5590
    @sumeetwadile5590 2 ปีที่แล้ว

    Very well explained!

  • @rayraycthree5784
    @rayraycthree5784 2 ปีที่แล้ว

    Why can't the same transistors used in the ALUs, LUTs and controller be used to build memory flip flop cache?

  • @nathanwest2304
    @nathanwest2304 2 ปีที่แล้ว +1

    looks like AMD was ahead of their time, may have been just by a few years, but they where the first to use the chiplet design and to use 3D cache

  • @NootNoot.
    @NootNoot. 2 ปีที่แล้ว +3

    As for chiplets and specifically future RDNA designs I wonder if moving from a N6 to N5/N3E MCD would even be worth it? And although it seems TSMC has hit a dead end with SRAM scaling, I wonder how well other foundries are doing. Like for example as you say, Intel is using some TSMC manufacturing for Meteor lake, and I wonder if Intel has a more efficient SRAM scaling.
    This also calls for Nvidias Blackwell. They've benefited a lot from Samsung's 8nm node to a custom TSMC N4 node. While I don't doubt Nvidia to take the performance crown again, I feel like 4000 series has benefited a lot from the silicon. Will they also have a desegregated design as well or will they pull some blackmagic with further increased power draw?
    Btw I think the thumbnail is great lol

    • @dra6o0n
      @dra6o0n 2 ปีที่แล้ว +3

      Nvidia hasn't got much CPU experience to do proper chiplet designs like AMD or Intel does, and Apple just brute force it's engineering with lots and lots of money in R&D to poach talents for that.
      Otherwise Nvidia would have pushed for chiplets sooner instead of showing a proof of concept one time and then forget about it later.

  • @RyanOwensWorldofTyros
    @RyanOwensWorldofTyros 2 ปีที่แล้ว

    What do you think about the idea of large tech companies starting a project to build new operating systems based on risc5 with less code and neural engines in mind. We could see gains from less work.

    • @karlogrimaldi6787
      @karlogrimaldi6787 2 ปีที่แล้ว

      And throw our old programs out the window

  • @Ken_1971
    @Ken_1971 2 ปีที่แล้ว

    How will IBM's light-based CPU handle this ? May be you make a video about this new upcoming technology !?

  • @SureshSharma-eq1vz
    @SureshSharma-eq1vz 2 หลายเดือนก่อน

    While transistor size is scaling down then why not SRAM which is made of transistors

  • @intetx
    @intetx 2 ปีที่แล้ว

    3D stacking might never use another node. The problem is two different nodes bend differently, which I could imagine could cause issues with bonding them directly.

  • @ubacow7109
    @ubacow7109 2 ปีที่แล้ว

    When will we move on to GaN and Carbon based computing

  • @anomalousresult
    @anomalousresult 2 ปีที่แล้ว +1

    SRAM size scaling slow down has been been observed since FinFET was adopted.

    • @HighYield
      @HighYield  2 ปีที่แล้ว

      You think it might change with GAA?

  • @milesyounghamilton
    @milesyounghamilton 27 วันที่ผ่านมา +1

    I would be interested in an update on this now that we have heard TSMC have been able to increase N2 SRAM density by 11%, widening the gap to Intel.

    • @HighYield
      @HighYield  27 วันที่ผ่านมา

      We have to see how dense SRAM actually will be. But 11%, after N3E didn't offer any improvement at all, is still a very small improvement. Especially since it's N5 vs N2, two full nodes.

  • @BiggySeth
    @BiggySeth 2 ปีที่แล้ว

    Need to start using nvme drives directly to the gpus to help compensate?

  • @mihaicraciun8678
    @mihaicraciun8678 2 ปีที่แล้ว +1

    love your channel, learned something new today! by the way, how long do you think transistors will be able to scale? or is an atom's width the limit if we can focus our lasers that small?

    • @Jabjabs
      @Jabjabs 2 ปีที่แล้ว

      The issue is not so much how small we can physically make the transistors, it is how much tolerance to errors and accuracy that can be managed via electron leakage. It something that is becoming a real major issue in chip design. We can make small transistors but they are so small electrons can just just tunnel through these switches (more like have the energy to over come magnetic/electric resistance) and thus negate the binary nature of the transistor. I little can be tolerated. This is called Boltzmann's Tyranny. This is a real world example of Boltzmann distribution in action. en.wikipedia.org/wiki/Boltzmann_distribution
      This is complicated but I will try to make it as easy to understand.
      Transistors still have analog properties, what determines something as being an on and an off state is not as absolute as we would like to think. It is a case of tolerances. If enough electronic flow gets through then the gate is considered on. The switch in a transistor isn't actually a physical switch but an electric field that when active prevents the majority of the electrical flow from getting through. There will always be electrons that get through, it is a case of how many that do and the tolerance of the output from this.
      Now have transistors that have only a few silicon atoms separating them, making the electron tunneling much more likely. This can mean that because of this fuzzy nature, it is getting more likely that we will have transistors that while are physical small enough to be functional, quantum tunneling makes them useless.
      There are two ways to combat this. Either we pump more energy into the transistor to increase the switch resistance but this will increase the temperature output. This is not a great solution as we have already been pushing the upper limits of thermal capacity for a good 20 years now. The other is lower the tolerance on accuracy. Meaning we could make smaller chips but there is a greater risk of them operating in odd ways. Could we build these things? Yes, will they work as we want? No. The base physics of the universe will have the last laugh here. We are toying with the fundamental laws of physics itself and it will not bend to us.
      To answer your main question, I think we will make it to the 1nm mark but it will be a long slow push up to it. I suspect that over the next decade we are going to see a few major features of future processors.
      Further ASIC design. Things like we are seeing today with the Apple M series chips where there are a lot of highly specific cores to accelerate functions were possible. We are going back to the design principals of the Amiga!
      More Chiplet design to compliment this. In order to keep chip binning to a minimum we are going to see the amount of chiplets bundled go through the roof!
      Increased electrical power demands as this chiplet designs allow for more compute power to be packed into a machine but this will be the last phase of just desperately trying to get performance out of silicon computing systems.
      This is it the actions of desperation to get a few more percent out of computers as we finally peak out in the early 2030's.
      After that, I feel the real push to optimize software will be the main game to get further performance. How do you sell hardware that is no better that the previous years systems? The way Microsoft has with Windows 11 - have processor specific security requirements. ;) This is why I envision that in the next decade the amount of vendor lock in is only going to get worse. I hope I am wrong.

    • @mihaicraciun8678
      @mihaicraciun8678 2 ปีที่แล้ว

      @@Jabjabs wow, thanks for this, very interesting and informative! Kinda sad since it really feels like we're just scrambling for workarounds to a physically impossible problem. I wonder if even beyond the next 10-20 years, how will things evolve? will we move on from silicon chips and into fundamentally different designs? I'm thinking that since we're currently using electrons to interact with the transistors, perhaps using smaller particles like photons (I don't think these guys even have a size since they're massless) could be a solution since they have energy so maybe that could be used as a toggle.

    • @Jabjabs
      @Jabjabs 2 ปีที่แล้ว +1

      @@mihaicraciun8678 Photon computation has been something that has been proposed for a few decades now. I don't see any major theoretical problems with in terms of physics it but it is clearly a major engineering problem as it has yet to materialize. Great in theory, terrible in practice - for now.
      This is why I feel software optimization will be the last big step. A lot of the software we use nowadays is astoundingly sluggish considering what is possible. Modern hardware is just so amazingly fast, it just plows through the inefficiencies. I remember building a sorting algorithm in assembly on a 33Mhz 486 back in the mid 90's. It could sort a data set of about a million variables in about 5 seconds. Excel running on my 4.2Ghz i7 will do the same in about the same speed... despite there being 6 cores each running at 100 times the clock rate. So yeah there is some room to maneuver here.
      Yes, we are scrambling for work around technologies nowadays. Every new transistor design and technique we are coming up with is buying us less time until we need an all new design again. I remember Intel in the mid 90's saying we had about 25-30 years until we would hit the end of the silicon road. They were not far off the mark. The problem is we don't even have anything viable on the horizon. This is a similar issue to things like batteries and power generation. We have come a really long way but the next major step is a long way off and we don't know where it will come from. But we will try anything and just maybe one of these experiments will hit the jackpot.
      And don't even get me started on Quantum computers, they are neat physics experiments but they will not do work in a fashion anything like what we use today. There are also fundamental limitations they have that additional complexity can decrease their reliability.

    • @HighYield
      @HighYield  2 ปีที่แล้ว +1

      I think logic transistors will continue to scale for a couple of years, but with diminishing returns. There are things like graphene as a material to succeed silicon, there are photonic chips and then we have EUV-lithography improvements like High-NA.

  • @GreggRoberts
    @GreggRoberts 2 ปีที่แล้ว

    My old Packard Bell used sram (simm ram). I remember it because the cyrix cpu required it to be installed in pairs like rambus did years later.

  • @jtjames79
    @jtjames79 2 ปีที่แล้ว +1

    Good. Necessity is the mother of invention.
    It's actually a problem that substrates only change when you absolutely have to.

  • @sheilaolfieway1885
    @sheilaolfieway1885 2 ปีที่แล้ว

    last I recall didn't the vic-20 use SRAM?

  • @Themisterdee
    @Themisterdee 2 ปีที่แล้ว +2

    Very interesting.. thank you.
    Dumb thought I know but ..
    Wont that mean that rectangular chips are soon to be obsolete? for there must be a finite limit to SRam gates/ wires per nm along an edge.
    As in if the shrunken dies get smaller it would mean more ports per nm of for example logic cell density thus more 'wires ' to the Sram edges
    Im assuming you would quicly run out of room .

  • @ChiquitaSpeaks
    @ChiquitaSpeaks 2 ปีที่แล้ว +1

    I’d like to know if there’s a difference in the implications of the importance of cache/SRAM in an SOC but I guess Apple’s decision making offers some insight in on that somewhat?

  • @KertaDrake
    @KertaDrake 2 ปีที่แล้ว +3

    I think we've reached the point where the obsession with improving processors needs to take a back seat to actually bothering to optimize code. We don't need increasingly small tech that will overheat at the drop of a hat because of how hard we are working it. We need increasingly efficient tech that can do more with less!

    • @Spido68_the_spectator
      @Spido68_the_spectator 2 ปีที่แล้ว

      And more competent software makers, that work to optimise on potatoes and scale to take advantage of powerfull hardware. Right now they just make it work, creating performance issues all around. No wonder CSGO runs so well on anything you throw at it, it was made when devs cared about it running on a potato

    • @Patrick73787
      @Patrick73787 2 ปีที่แล้ว

      I totally agree. The software side of things has so much catching up to do.

  • @mjdevlog
    @mjdevlog 2 ปีที่แล้ว +3

    Great video! I really appreciated the thorough analysis of the potential problems with next-gen CPUs and GPUs. It's important to consider these issues and have a critical eye towards new technology. Keep up the excellent work!