Excellent analysis as always! I am looking forward to 3D v-cache as another Zen 4 option we'll have this gen alongside X and non-X variances. Glad you explained trade-offs doesn't necessarily have to be a bad thing. Everything in life comes with trade-offs. It's choices and context that makes them bearable.
I enjoy how you communicate these ideas. It's very clear, concise, and informationally dense without being cluttered. Very interesting stuff! Nicely done.
@@elijahjohn4482 Perhaps, but I think there's more to it. I think he's just a good communicator. As someone who also speaks German and French (English is my native language), just being German doesn't automatically mean clearer communication. This is more a credit to the person himself. :)
Your analysis makes perfect sense and for gamers, the trade off has been well worth it, specially on very cpu bound multi player titles. If AMD is able to keep the price to performance value at the same range as the 5800X3D, it will further consolidate its position as best value part for the segment.
Really good analysis. A lot of tech channels keep talking about how AMD fixed Zen3D with Zen 4, but never stop and think about the details. I'm sure the CPUs are gonna be great for gaming, but I fear the hype and misunderstanding of the new 3D Chips are going to "disappoint" some people.
I would like to reiterate previous comments and emphasise how much I appreciate your explanation of x3d chip structure and the differences between generations. I have yet to see this information covered elsewhere. It seems you have more than earned our subscriptions.
The one reason I’m buying the X3D chips is because the 4090 is bottlenecked even by a 13900k fully oc’d in some games almost by 10-20%. Only the supercharged cache X3D Chips can hope to remove that issue. And the dual CCD design is really cool. Because I’m actually really interested in testing stuff.
As always, this was a great analysis and I completely agree with your conclusions. There are tradeoffs on engineering. And cost/power draw are not the only possible disadvantages. That's why it's important to understand your workloads before choosing the hardware. Zen 3D it's not a premium zen piece (it costs more, so it is in that sense) but it's not automatically better. But having trade offs doesn't make it a failure and it's not all doom and gloom.
7:37 not necessarily. The Tjmax is a value that means the maximum temperature the cpu allows the sensor on the die to observe before throttling/safety measures, right? But it's a heuristic based on the placement of the sensor. If the sensor is moved or if the die layout is different relative to the sensor, it will read a different temperature value than another sensor. The engineers offset this value so that it can be used as that heuristic to safely throttle and protect the die. In other words, a sensor being moved to a place that reads a slightly lower temperature at the same actual die temperature means they have to lower the tjmax value as well in order to represent the actual die temperatures at the hotspot or safe total temperature. So a lower tjmax doesn't necessarily mean earlier throttling relative to heat generated/heat dissipated.
The TJmax difference could also be from the public reaction to the first chips running at 95C. They've already said that they lowered the power target to 120W because of public perception of the 7950X efficiency with the 170W target. 89C looks to me suspiciously like "just under 90C". The 7950X actually has a hard temperature limit of 115C I think. You can set this in BIOS if you want to.
89C looks a lot like the second digit is the deternimating digit, so, precisely 89C, where as 90C could be "around 90", as the 9 would be the deternimating digit.
@@GholaTleilaxu CPU temp really doesn't even matter, it has nothing to do with heat output and everything to do with thermal transfer. 13900ks runs up to 105c and uses up to 350+w under load. The silicon can handle it just fine, the difference is high temperature packaging, the substrate and underlay glue. The high temp packaging costs more so was not normally used in desktop CPUs, only laptop CPUs. These new generation desktop CPUs are using the high temp packaging so it really shouldn't matter. As for heat output into your room/case, that's purely down to how many watts the chip consumes, watts are watts, regardless of the temp the cores are running at.
@@GholaTleilaxu The second law of shit you clearly don't understand. There is a large variable you're missing, thermal conductivity through the IHS. If the IHS was made of wood the cpu could run at 200c while putting less heat into your room than a ryzen 5800 running at 40c.
Planning to buy a the 7800X3D if reviews show good boost over 7700X. I am concerned about the dual CCD models, but who knows, maybe they'll be good! I bet you could undervolt the 7800X3D to get stunning performance/watt figures. Can't wait for reviews!
Then you dont need an X3D chip if you need to see reviews,people who want that chip will buy it knowing that it will boost their Sim/Emulation workloads by 40%+,not to play far cry at 7% more fps.
@@Kevin-fl7mj Wait... so you're saying it is 100% day 1 buy regardless of overall performance, performance/$, or cost? ... ... Or that I shouldn't buy it because I don't play sim games? Or I should start playing sim games and then buy it?
@@paulw7738 What I am saying is,for average AAA gaming you dont need a 7800x3D unless the upgrade somehow costs you 50$ since it does not provide anything you cant already do with a 7700X,going from 165fps to 180fps on a 4090 at 1440p wont change your experience,where as with games like DCS/Star Citizen where you get regular drops below 60fps on top end hardware V-cache is amazing with gains over 40-50% in some cases.
@@Kevin-fl7mj sure, that makes sense. But for the games I play that are severely CPU bottlenecked, the question is, are the performance gains on 3D vs 7700X worth additional cost when I go AM5? Also curious how much performance is lost in non-gaming workloads due to lower clocks. What will MSRP be? How about temps? OC support? Lots of questions that will be answered when reviews come out.
@@Kevin-fl7mj True, i’m a big Hearts of Iron 4 and Europa Universalis 4 players and i’m buying the 7800x3d to help in “late game” where the cpu has to handle thousands of units in real time. Needs cache like a mothertrucker.
One point about the TDP value that I’m very iffy about. GamersNexus has mentioned that the TDP value AMD provides is based on some arbitrary formula, which they can tweak to land on a specific TDP value, and isn’t important in the real world. The formula is: TDP = tCase°C - tAmbient°C)/(HSF θca) Based off of this, they can define what the case and ambient temperature is exactly to land a number that they would like to market. Maybe something you would like to further look into? Would like to hear your thoughts on this!
I hope 7950X3D will be a good one, as I am considering replacing my 5950X to get some extra gains out of my RTX 4090. But if it dosent live up to my expectations, I will wait for upgrading. The productivity performance of the 5950x is still good and DDR5 and AM5 boards aretill quite expensive.
Tbh I think I'll be holding onto my 5950X for at least another generation or two. The 7950X3D just isn't quite the upgrade I was looking for yet (the poor heat transfer on both the normal models and the X3D is especially concerning).
@@juicyhdx782 the entire point of this video is that AMD has to meaningfully drop the clocks on these chips in order to keep them cooled. Not only that but on the normal models, the IHS is extra thick so these models could have the same z-height and removing that IHS or milling it down have been shown to rather dramatically improve the performance of the 7950X because it would love to draw more power, but the IHS is keeping it thermally constrained.
it all depends on which gen you are coming from. usually it wouldn't be X% coming from the previous gen that you would want. If someone wanted more than % better.
One thing that could solve the thermal issues for x3D chips entirely is if they could manage to put the additional cache under the existing CCD rather than on top of it. I don't know how difficult this would be, but I see no reason as to why it would be outright impossible. This would probably require a lot of re-engineering if you start from an existing CCD design, but if the cache-under-CCD usecase were to be considered during the design of the Zen 5 CCDs it might be feasible.
LOL, I just watched a Doug DeMuro video right before this. I'm sure he would be thrilled to learn that "Quirks and Features" is now part of German tech reporting. :)
Something you've not touched on though is that the faster l3 "3d" cache with lower latencies will counter the higher reduction in clock speeds for the 7800x3d models and offer more of an increase where the clock speed reductions are lower (vs zen3 x3d).
At the time I made the video the price wasn't known, but in the meantime AMD has announced it. I made a community post about it: th-cam.com/users/postUgkxv1O_RcZIvii-E0maCyu1iF7mYGOnvH57
I recall hearing about a chip cooling tech involving "pillars" going through the chip that would help with 3D layouts, maybe something like that can be the solution for X3D's heat dissipation woes?
I sort of wondered why they didn't install copper or other metal pillars through or athwart the chip to dissipate heat...but I'm not a chip engineer and figured I was just dumb.
@@BronzeDragon133 Yeah, the TSVs are probably copper? They should be much more thermally conductive than silicon. I wonder if they could put a bunch of extra TSVs through the cores to pull more heat out.
Another great video High Yield. In answering your question as to what the future of 3D stacking looking like and how will we get around the issue of stacking heat sources, the answer is on die micro channel liquid cool. To properly realise 3D chip designs TSMC have been testing Direct Water Cooling, where the water channels were etched directly into the silicon layer on top of the CPU. In there tests, silicon channels were etched into a silicon layer with a silicon-oxide thermal interface material between the microfluidic system and the actual silicon of the TTV. In a third option, the silicon-oxide TIM was replaced with a liquid metal TIM. The results were really impressive and could enable more than 2000w chips what are still properly cooled if we ever needed to these levels.
I've got my components for more than a month and waiting for the 3D V-Cache CPU, despite the trade off. Even if it runs slower than the normal X variants, CPU of 2022 and 2023 is already pretty blazing fast, even if it's a bit slower.
Sounds like they need to entirely re-evolve the thermal transport within the stacked layers instead of just relying on a massive top mounted sink to take care of it all by itself. Microscopic on die heat pipes? Electric nano-scale heat pumps? Evolve the form factor for dual sided active cooling? Less sexy but probably easier in the short term. The future will be interesting.
Best part about the X3D architecture is that it gets way more performance out of the same CPU and memory speed. The 5800X3D with 2666MHz DDR4 performs close to the 5900X using 3933MHz DDR4. Even if the 7800X3D performs midway between the 7700X and 7900X on 4800MHz DDR5 that's still amazing. You could in theory SAVE money over getting a 7700X since you don't need to worry about much more expensive RAM.
It's even better if you already had slow memory because you wanted to save money earlier. This happened to me by chance when I made a system with the athlon 3000G and 3000MHz DDR4 three years ago. I just needed to get the 5800X3D, and the much larger cache took care of the rest. If only GPUs were affordable I'd gotten one of those as well, but I can wait another couple years.
Processor manufacturers would rather choke themselves than give consumers a platform with 4 channels of DDR5 memory. Although physically 4 memory sticks are connected to almost any board. No, they will play with caches, introduce "new" generations of processors, selling us the old solution again and again.
If you want 4 or 8 channels of memory, you can pick it up whenever you want. It’s readily available. You’ll just pay for it. It’s in the HEDT, workstation and server parts. But I would offer you an alternative understanding that it would not be as worthwhile as you think it is, unless you have a use case that is that memory intensive. Both are very expensive. On board cache is the most expensive memory there is. It takes up a lot of space on a die. The larger the die, that means fewer dies per wafer and the higher odds of error hitting a die. AMD pays TSMC per wafer; the yield is the yield. A DDR memory port also takes a lot of die space. In this case it’s probably on the IO die for AMD. Adding 4 channels might increase bandwidth but what about the people that just buy two dimms. Then you have 2 memory channels entirely unpopulated. Additionally the tracing to all 4 has to be identical. I think a major thing you are missing is HOW much faster cache is vs system memory. It’s not even in the same ballpark. To a modern CPU which can perform a dozen operations in the time it takes for light to travel from your desk lamp to the desk, going out to main memory is a literal eternity. Say you do one math problem a second, but sometimes you have to look up a value. if you have cache hit in L1 you had to wait like 5 seconds to get the answer. If you miss in L1 and L2 and hit in L3 you probably had to wait about 30 seconds. That’s a long time- it just cost you 30 other points while you were waiting. Now if you have to go out to main memory, it’s like sending someone from the desk you are working at, across town to get the answer and drive back. It’s more like 2 hours you can’t do anything. And the processor is stalled until it gets the answer back. If you have to get it from disk to read something, it’s like waiting 2 years for the answer. On die cache is so much faster it’s not even in the same ballpark. The larger that cache the less often you have to burn 2 hours going to main memory. That’s actually what hyperthreading does though. It allows a processor to switch to another process while it’s waiting for the first process to continue. at least it’s saying busy. It will continue on that until it gets stuck and swap back and forth. But getting data from main memory is super, amazingly slow. 4 channels of memory doesn’t have much point until you have a lot of threads and a lot of memory utilization. The limiting factor is often not having a free ram request line open, it’s the time it takes to make the request in the first place. Servers may have thousands of threads doing memory intensive stuff, then 4 or 8 channels makes more sense. In the consumer space it adds a ton of cost, takes up a ton of board real estate, and increases the consumer cost who now have to buy twice the memory and have to pay more for the more expensive motherboard. If you want 4 channels you can get 4 channels. No problem. You don’t need special permission. You’ll just have to pay for a platform with 4 channels. And you’re probably looking at 4x-10x the cost.
I could see the thermal issue be solved if there were some process changes that introduced the equivalent of heat spreaders in the bulk silicon, so there are less issues with hot spots.
why did you take out the 7900x3d and the 7950x3d in the clock speed part? those two have the same boost clock as their non x3d counterpart, so your conclusion that it is a reduced clock speed fails for these two.
I've been a computer tech for well over 30 years and I have worked in engineering (lithography) for decades as well. Core frequency is only *ONE* of several factors in efficiency. So, when I hear people speak so heavily on the frequency as a deal maker or breaker, it frustrates me deeply. Cache, bus speed (communication between the processor and the ram) are insanely critical. With 2 chiplets sharing the total core count like the 16 core AMD procs, the Infinity fabric is immensely crucial. But losing 400 or 500 MHz on boost or even core frequency will make almost no difference. Especially when it is only ~10% loss compared to other models. Ram configuration also matters. (Dual channel ONLY for gaming or you lose perfoemance). Quad for production software (as quad channel is huge for increasing production software performance). Also, the lowest CAS latency on the higher end of frequency. I am buying the 7800X3D and because the MAXIMUM and most OPTIMAL dual channel support it has is 5,200 MHz I went with G.Skill DDR5-5200 - CL28-34-34-83 (83 being the lowest clock cycles to succeed a check out of all DDR5 ram kits). The cycles to successfully check the data is the most significant number in ram.
Faster CPU isn't my gaming issue right now, Try internet speed and packet / data loss, Australia needs a big fix and it isn't coming any time soon, so my 7700X is the sweet spot and any further development from AMD is just a pipe dream. Low latency Then higher FPS is what is needed. But I'll be watching x3D closely over the next few years. Love the breakdown, thankyou.
As a software developer I would take more cache any day with trade off of some clock speed. My reason is writing cache friendly high perf software is very costly and at the same time mental demanding. Data structure size, memory layout, cache line padding etc is just too much for most dev to the point they don't bother at all.
the top of the line up 7950x3d will have 5.7 ghz boost. Still its a few percent either way. But vanilla 7950x has same boost clock. Also, u have to consider that not all CCXs will have 3d v cache.
I'm really looking forward to 7950x3D because when you run 128GB of DDR5 you have to run them at lower speeds, so I'm hoping the added cache compensates for this in games.
Excellent, concise overview of the current and incoming state of CPU hardware from AMD currently. It really is excellent that they can provide an option for such a massive change in hardware for improvements in various workloads / gaming tasks. One additional thing that I think should have be specifically addressed in this video is how the increased IHS depth from Zen 3 to Zen 4, done for cooler compatibility purposes, further exacerbates the heat dissipation issues. For future considerations it's important to note that TSMC's ability to manufacture parts using multiple nodes will depend on the availability of ASMLs machines for older nodes. If it is found that specific nodes are indeed optimal, we should expect to see more expansions of current foundries as opposed to new facilities being built out for singular new nodes.
Another great video! I really like the Rumour Mill Roundup segment. I was surprised to learn that there was only one stacked cache on the 7900X/7950X, but at the end it makes more sense. While I think it is also N5 on N5 as well, apart from packaging design, I wouldn't be surprised if it's something like N7/N6 on N5 knowing how much it has improved. Thermal limitations haven't really changed and the stacked cache still needs the CCD to downclock on the frequency. With how AMD defines "TDP", it proves that the 3D cache still needs a lot of thermal dissipation, and the Tjmax also proves just like the 5800X3D how much power it'll be pulling compared to the non-stacked cache. While it still concerns me how this second generation of 3D stacked cache has improved, the benefits will still please gamers. The only Achilles' heel for AMD is once again software support. I believe that is why they were still holding back on some freq information. Although Intel has proved this can be done, I find it interesting how different AMD will be kind of doing the opposite, with the "P-core" for gaming clocking less and having cache, and the "E-core" clocking higher without stacked cache.
Very interested to see the tests, especially of the dual chipset variants. My suspicion is that the leaked performance might have been exceptional due to pre-production memory controller and secondly apps known to benefit greatly from cache. Still I'll err on optimistic side, Zen4 has had work for more front end throughput and I expect the asymmetric CCD choice was a way to have fast ST with a pool of threads able to share a large cache for normally threaded applications. I suspect asymmetric dual chiplets were tested on Zen3 before Zen4 was finalised after they found dual V-cache wasn't delivering in consumer review benchmarks. V-cache doesn't scale with process shrink, they chose 6nm for Radeon MCD so my guess is they bond 6nm V-cache with the 5nm L3 V-cache vias. What's the point of using a more expensive node when the density and power efficiency won't be improved?
could also be just costs. maybe they need to use better chips for this to work out and they didn't want to eat into their server sales, so decided that one was good enough for the consumer marker.
IIRC, Lisa Su showed a "5900X3D" in Summer of 2021 (at Computex?), but then AMD only released the 5800X3D. I'm sure you are right, they must have tested it with Zen 3.
@@NaumRusomarov possibly but if dual V-cache was a big benchmark & efficiency win the higher cost would be fully justified. Server sales being cannibalised is more an argument against Threadripper than desktop.
I have seen concepts that provides microchannels within the chip designs to allow more adequate cooling of these transistors. As the technology moves towards multi layer 3D stacking, I imagine that will become a necessity to allow the chips to be well cooled.
A modern chip can only run at 20-30% of it's full capability due purely to thermal constraints. Rock doesn't conduct heat very well. Increasingly large parts of a die are already dedicated to structures that support heat transfer better. Modern architectures include of ton of these heat channels and more.
You can't really draw conclusions about heat from sensor data, unless comparing CPUs in the same generation. Sensors can be in different places, the die could possibly take more heat and a host of other issues. So while what you said is a decent idea about what happening with TDP and Tjmax, it's not fact. We will have to wait for what AMD says to know what they were dealing with in the lab. Also I'm going to disagree about the power distribution you showed for single core vs. dual core chiplet CPUs. With Zen 3, both single and dual core chiplet CPUs had the same power cap. That's not true anymore. The dual core chiplet parts have 170W TDP so each core can run hotter than previously, and dual chiplet parts can use a bit more power when running all-core loads. What you showed is basically true if NOT running all-core loads, and in fact you'd have to have the CPU running roughly 60% util. or less for that to be true. In GAMES what you showed is basically true because a 16 core part is not going to get to 50% CPU util, and the scheduler will throw threads at both core chiplets so in that case the heat is distributed better. So this is a case where you can't compete Zen 3 to Zen 4 because with Zen 4 the dual core chiplet parts are allowed to consume a lot more power.
You're absolutely right. I love my 5800X3D, but there are definitely tradeoffs. My 5800X3D is watercooled, yes, full custom loop with an EK block, to keep it at the maximum boost clocks I can get, and I do get great boost clocks of 4.4GHz on all cores at once. It still hits 4.5GHz briefly on single cores, but it stays a bit longer than on air cooling. It's not overclocking, but the extra cooling gets the best out of it. I would never go back to a chip with a smaller cache. My games run so much better.
Have you undervolted yours yet. I have my 5800X3D running great at a flat 1 voltage and under a Aida64 test it never goes over 62*C on a 360 AIO @4.45Mhz all cores. Must the time in games like Satisfactory I never see over 46-48*C. They really do love undervolting!!
@@TdrSld Yes, I instituted a -0.10v undervolt in the bios, which puts it at 1.2v max and about 1.05v at idle. Makes a heck of a difference. Under 35C idle, usually around 32C, max 66C under heavy load, 45C while gaming, and constant 4.4-4.5GHz clocks. It really helps. I use the full custom loop more because of the GPU, and add on the CPU just for the convenience.
@@dangingerich2559 it's great how well the 5800X3D responded to undervolting. Right now seating in a 75*F room playing Satisfactory the CPU seating at 37*C. I believe the 7000's units will do better.
@@TdrSld Yeah, the current 7000 series would probably do better in many ways, but the lack of cache would hurt gaming performance in some games, notably the games I play. Plus, there's the massive cost involved in replacing the CPU, MB, RAM, and waterblock, that would exceed $1000 for my rig. The minor increase in performance simply wouldn't be worth it. Gonna keep my 5800X3D for a good while longer.
This explains why in AMD's announcement address, they talked about using a 280mm AIO to cool the X3D chips. I wondered why they didn't mention air cooling, and only talked about water cooling. Perhaps the new X3D chips may actually require liquid cooling? Time will tell.
Excellent analysis as always, really liked the intro where you went over the rumors and if they were accurate now that we have the official specs. As for the thermal issues posed by any die stacking method, I would think if AMD is making sure the stacked cache only covers part of the die, they could use something more thermally conductive than silicon above the cores themselves to compensate for the loss in having the ihs contact the die
@@kecimalah I wondered about this. It seems 2 small pieces of copper would be far superior to pieces of silicon, but if the pieces are bonded to the silicon below and not just "sitting" this makes sense.
I know it's stupid, but I had issues with my dual CCD 3800X in the past (stuttering and single thread performance reduction) so I'm not getting anything dual CCD until it's proven mature for at least 2 generations.
Imagine if it was possible to instead of making the silicon flat on the horizontal axis and therefore the additional 3D layers horizontally stacked we could make it vertically with an IHS that cover the left and the right side, this hypothetical way would not decrease at all thermal performance on the first stack as they would be facing opposites ways that can be cooled independently, if we have two layers at the trade off of slightly reduced thermal performance we could archive the same decrease with a 100% more silicon with 2 layer on each side giving us a total of 4 layers. This technically could be even more complicated if we had a cubic 3D system where we could have 3 sides with all the cooling capabilities although with a caveat. You could have 2 vertically dies and 1 horizontal but if you want the 3 to have the same area that would create a pocket in the middle with the least thermal capabilities, I suppose it could be use for like an IO inside chip that requires a lot less cooling but is too much empty space, specially if we were to grow the size of the top and side does, as the difference between surface and internal volume differ exponentially as surface increase to the power of 2 and the volume “unusable” would grow in the power of 3. The best solution for 3D silicon I can came up with would be using the first idea of vertical silicon with layers in opposing layers AND having multiple of them in the same CPU/chip as if they were chiplets but instead of an array of horizontal “plates” we could have an array of towers of silicon all over the substrate, and if we managed to create a thermal solution that may or may not be part of the IHS that goes from the base substrate to the top of of the towers filling the gaps we could create a thermal solution that when applied to that system (if the on substrate solution we just talked about had enough thermal conductivity) would cool all silicon dies/layers equally as good or with very minor differences between the parts close to the substrate and the ones close to the cooling element on top, With this system I imagine that we could have Cubic looking CPUs or SoCs with hundreds if not thousands of layers pretty well cooled. Maybe the easiest way to visualize this would be to imagine a CPU die grow it vertically as much as it is horizontally and slice it (like with those tools that can let you slice a fruit or what ever in a single move) and in regards of how to move the data to the substrate, between two opposing layers of silicon we can have a very thin copper or something wiring
Another way for the best 3D chips would be cubes of silicon designed leaving a system of microscopic pipes to use an on die water cooling, water cooling on die is actively being researched nowadays, I don’t think it was with a tube/pipeline system I think it was more like surface structures on the silicon to increase surface area for the water, but this could evolve to a pass through system that I just mention
On other multi-CCD designs from amd a CPU core will actually check the L3 cache on other CCDs before going to system memory, meaning it could sort of act like a pseudo L4 cache for single threaded applications. Theoretically this means the higher clocked cores in the new dual CCD X3D CPUs could still benefit from the 3D V-cache on the other CCD with only a slight latency increase (still a lot faster than going to system memory)
Great video, thank you! The problem of power density is one of the 2 most important problems in CPU design. The other one is that price of single transistor start rising with new technological nodes. As you have said it's physics limitations. Looking at the data of power density (W/cm2) for all the CPU's from the start of 2000 you'll see that no CPU has beaten 120W/cm2. Todays processors are most dense in that sense. And the second runners are... Intel Pentium 4. Pentium 4 microarchitecture was designed for high clock speed, and therefore high power. Reaching about 100W/cm2. This also explains why M1/M2 is so much powerful and competes event with desktop chips. Apple designing their chips around performance per watt from the beginning because of limited power supply. But it was inevitable desktop becoming cooling limited. So nowadays 100W/cm2 is all you have. At least while we are waiting for GAA-transistors or some other "miracle material" to control leakage.
I assume the 'no clock speed regressions' was back then because leakers heard that the max boost was the same on 7950X and 7950X3D (considering same power being used for both). But that was because one of the chiplets didn't have V cache, which no-one thought about at the time
heat transfer isn't really affected much, it's the target temperature the cores ae designed for/set to throttle to that is lowered and thus limits how high the 3d cached die can boost
As for the cache chiplet, i'd strongly assume N6. As long as you consider the maximum TSV pin density per area that N6 allows when designing the base die, stacking is not an issue. This may be an explanation for 3D chiplet boost clocks (besides core voltage limit and thermal considerations) aswell, since AFAIK the L3 also runs at core clocks. As seen with Zen 3/3+, the max possible speed for AMD's L3 implementation on a N7/N6 process is around 5Ghz, which just also happens to be the 7800X3D's max clockspeed.
For me being on Zen 2, not wanting to upgrade to AM5 just yet and seeing the 5900X and 5800X3D at the same/similar price point, it's hard to decide between them. I see greater performance on the 5900X for the multithreaded workloads I have (video encoding, blender, code compilation) but the 5800X3D would give a greater boost to gaming. On balance, I probably care about gaming performance more than shaving a few seconds off those other tasks but maybe 7950X3D/7900X3D will be the best of both!
I've been using a Ryzen 5 3600 up until October, when I upgraded to the 5800X3D (which I bought at launch but never installed it... silly me) TBH, the 3600 was still more than capable for everything, even editing my YT videos in DaVinci Resolve. If you need more performance, I'd go with the 5800X3D, simply because I'm a sucker for 3D stacking, but with the current direction of CPU pricing, maybe wait for a good AM5 deal?
@@HighYield The downside to AM5 is needing more than just CPU (motherboard, RAM, potentially a case and PSU). I play a lot of strategy and sim games like Factorio, Civ6, and the extra cache seems to help those more than others so I think the 5800X3D will be best for my case. Good job on the videos, BTW. It's nice to have some analysis instead of constantly pumping rumours and speculation as on some channels. :)
Wonder if they will ever be able to stack more layers and add another layer of cache, not sure if that's possible to cool tough, as 3dvcache is already problematic to cool.
Well RDNA 3 has 3-levels of infinity cache stacking in their labs, so it's possible. Like you said it's the cooling the would be an issue. But for Genoa-X I could absolutely see this being used because of the lower TDP/chiplet
VERY GOOD. There are too many people who think you throw on more cache and everything is better. Well, no it's very dependent on the program AND the data, how it's accessed, if you store it, is it a data stream or just a huge set of data that keeps changing like in games, or different engineering modeling where data sets can be very large that you're actively working with. The first thing I said to myself when I saw the specs for Zen 4 is L2 is doubling. I have to think that's going to reduce the benefit of L3 because you'll have less misses in L2. To me that's a no-brainer. And for the games that didn't benefit with the 5800X3D the same will be true for Zen 4. It's once again all about the data, not the CPU. The data sets aren't changing so a CPU that has no benefit from added L3 in one generation won't get a benefit in a newer CPU either ESPECIALLY when you doubled L2. You already got all the benefit you can. Having said that many newer games will have larger data sets with more assets that a user can interact with and more data about these assets needs to be maintained as close to the cores as possible when you're close to those objects. And then there's the downside of adding cache, which usually adds latency to accessing that layer. And no, a faster CPU doesn't make you GAME any faster, it just boosts fps IF the GPU isn't a bottleneck, or already hitting 100%. So no, the data you are working with at any point in a game isn't going to increase just because the CPU is faster. It's all about the game data and a faster CPU doesn't change that if you aren't moving through the game faster, and you won't be.
What a relief that at least you checked out that the dual ccd variants are only a eyewash related to boost clocks! And you are right with them being the first hybrid big little architechture variants !
Vain hope: someday, we'll get to the point where amd can use 3d v-cache the other way around: put the cache die under the cpu die for better heat management. That probably requires many more TSVs though, and other changes.
If only we would be living somewhere in space, far away from Earth's gravitational attraction...We would not survive for much time but what a happy, short life would that be, filled with 3D V-cache surrounding us on all sides!
The 7900x3d is what I’m more excited for because I think it will gain the most from the mixed chiplet design they went with. And how this works out will determine my excitement for when AMD decides to put a high density C die and the normal computer die onto 1 package
Very important argument against the 3Dvcache helping a lot in the 7900/50x3d is that the extra cache is all on one ccd and none of it on the other. The 7950x already loses to the 7700x in gaming cause of the bottleneck imposed by infinity fabric and having a lobsided cache is going to make this worse. Now the windows scheduler now has to figure out if a game process preferes 3dcache ccd or the clock speed ccd, you don't get both at the same time.
The Reason the Dual Chipset CPU's have higher boost clocks than the single CCD 3D V-Cache is that the Dual Chiplet V-Cache ones only have V-Cache on 1 chiplet, the other is just standard. So the V-Cache Chiplet will run a lower boost speed than the non V-Cache one (we might see only 5.2Ghz on V-Cache chiplet and 5.7Ghz on the Non V-cache one according to sources. This is further cemented because AMD is working with Microsoft to update the Windows Scheduler to prioritize the 2 Different CCD's. Gaming applications that will benefit from the extra cache will be run primarily on the V-Cache chiplet and talks that run better with higher clock speeds (video encoding and such) will be prioritized on the non V-Cache chiplet. This will be similar to how Windows handles the P-Core and E-Core arrangement on Alder lake and Raptor Lake.
Sadly, AMD's TDP values are barely related to power and can't be used to deduct power consumption. They use the formula: TDP (Watts) = (tCase°C - tAmbient°C)/(HSF θca), so a temperature difference(that does not even include tjmax) and a heat flow coefficient. They make up the numbers to fit whatever they need them to. Gamers Nexus has a deep dive about it.
The only reason Zen4 supports faster RAM with lower latency is because of that 12nm IO die on Zen2/3 (if you notice, Zen+ and Zen2 had similar memory limitations because the memory controllers are both on GloFo 12nm) If you use a 4000G or 5000G you can generally still beat all DDR5 kits suppoorted by Zen4 even on competitive OC boards for RaptorLake This is because the memory controller is both monolithic on dye, and on TSMC 7nm. I've got a 5700G with 75GB/s bandwidth and 49.1NS of latency, that latency is competitive to a 13900KS with DDR5 8000C36 (I'm running 4933C17-17-17-28 while in 1:1:1 timing ratio) This is just not possible on Zen4, but because Zen2 and 3 APUs have 1/2 the cache of their CPU counterparts, this results in weird performance In games where the 5800X3D accels, you're probably better off with a 5600X than with the 5700G with 4933C17 RAM But where the 5800X3D loses to the 5800X non 3D, the 5700G with fast RAM will generally be even farther ahead in performance. I'll be interested to see how fast we can get RAM on Zen4 APUs if they're ever brought to desktop, a 5nm IMC seems like it might support up to 7600C30 in 1:1 as long as you're using a 2 slot ITX motherboard and good RAM
I imagine, in the not to distant future, we see CPU's with the die(s) accessible from the top and bottom of the CPU. And we get motherboards that support this. Would allow us to cool both sides of the CPU. Currently, We are using only half the surface area to cool a CPU. So if AMD or Intel worked out the design of the cpu die(s), to allow them to be accessible from sides, we could nearly double our cooling performance. It would take a lot of work, But I don't see any inherent issues, except maybe having to make the die(s) larger, since the base can no longer connect to a pin, but the pin outs on the die(s) would have to be along the edge of the die(s). This would make current computer cases obsolete. But is that a BAD thing? It would give case makers room to come up with new designs. Same idea for pc cooling companies. The first few years would be super interesting and fun.
I'm not an memory expert but i heard that DDR4 has a lower latency then DDR5. "Compared to DDR4, DDR5 RAM has a higher base speed, supports higher-capacity DIMM modules (also called RAM sticks), and consumes less power for the same performance specs as DDR4. However, DDR4 still holds some key advantages, like overall lower latency and better stability." So that would mean the 7000 series X3D would even more profiting from more cache then the 5000 series X3D.
At base specs yes, but DDR5 is build different internally (related to accessing the memory banks), and thus the true latency should be lower over all. But tbh, I need to dive more into this tech at some point.
latency means nothing without context. Let's take your typical 60 fps game or hack let's just say every game run at 240fps. Then naively speaking you would have roughly 4ms for rendering every frame. Then let's just say DDR5 is two times worse latency wise that take 100ns more for a single read. That would take up roughly more than 1/40,000 of your frame time for a SINGLE read. This is the reason why typically memory is marketed with their bandwidth per second because a single read latency does not matter (unless you are some kind of crazy use case where ns latency matters which surely not for 99.9% of users)
As a gaming CPU assymetrical CCD design seems just about perfect, the only way to make this better would be shrinking half of the cores into efficiency cores for improved overclocking on your main cores.
They could re-position the cache for increased latency in exchange for lower temperatures, but that would defeat their overall design goal. Overall, when it comes to gaming, the temperatures are not much of an issue really as long as the GPU is the final bottleneck in a given situation. There are outliers which are really taxing the CPU, like Monster Hunter did at the release, but they are few and inbetween.
My money is on AMD coming out with a much improved version of 3D V-cache in the long run. I'd bet my money on a mixed silicon and high-conductivity material for its next generation chips. Graphene could be it. I just don't know how the layer deposition would be done. But I'm sure clever engineers already have ideas on how to mass-produce such hybrid chips.
They should put the blank transistor free silicon on the bottom under and raise the good silicon.if possible I know there is a physical design issues but maybe they can do it.
I wish they'd use a in silicon vapour chamber... There's techniques used in steam engines that can allow you to use steam and no other moving parts to pump water into a pressurized container using the pressure in that container to power the pumping. And phase change should do a good job of removing excess heat during peak loads. A moving fluid/gas probably transferring the heat faster then a solid metal would. You could essentially make a "refrigerator" powered by the CPUs own heat and the temperature difference vs another part of the chip and edge it into the silicone between the 3D V-cache and the CPU itself I think.
Good video. As engineer, i totaly gree on every point you made. I dont belive it will be as good performance jump as last gen. Main reasons IMO are 1) 2x L2$ and DDR5. On cache (and RAM in general), depending on app it can be latency sensitive or bandwith sensitive.
You touched on it, but I wish you'd gone into more detail about the power sweet spot. As GN has shown, the X SKUs pump far more power to hit those high numbers. In performance/Watt "Eco Mode" or the non-X variants can actually beat the faster chips. The X SKUs are basically pre-overclocked for marketing reasons. Running at a lower TJMax and closer to that sweet spot just brings them down to reasonable levels. The other thing you may have forgotten to discuss was the sensitivity to higher voltages. Which is another reason Overclocking is disabled.
My 5800X3D runs below TJmax and boosting to ~4,25Ghz on 105W in Cinebench. With PBO Curve Optimizer (via a thirdparty tool) and -30 on all cores, i can get below 80°C with all cores running on 4,45Ghz with 105W. I use a silent 280mm AIO for cooling. I had a 5950X before that. With all 16 cores running on 4,2Ghz, i had 65°C on all 16 cores with 142W. Or ~80-85°C with 4,5Ghz on all cores and >200W OC. That said, the hottest it got was on single core boost when it boosted upto 5,05Ghz with 1,5V (stock settings). It got to 85-95°C with just 80-85W of power. Technically both CPUs never hit TJmax in my case. It was either the voltagelimit or powerlimit. With Ryzen 7000 it is different, since they hit TJmax first above all.
12:55 if it were an N7 node for the V-Cache, they wouldn't have needed to mix the CCDs like that (with+without cache). I say both will be N5 because they're saving cost or manufactruring capacity.
It could be possible, but then you have to pass all the power connections meant for the CPU through the chiplet, another design problem. It's basically finding the least problematic implementation.
Around 3:35 , "DDR5 has higher bandwith and better latency" is not really true. The typical latency is actually worse at the moment. DDR4 3600 MHz CL16 has lower latency, than DDR5 6000 MHz CL36. The situation is improving, but we aren't there yet. Fix me, if I'm wrong here.
7800X3D is overall better gaming CPU then all 13th Gen CPU'S except ofcourse 13900K, 7900X3D is true competitor of 13900K, 7950X3D competitor of 13900KS. There is even a video for 5950X and 7950X users for a tool to use first 10 to 16 threads for games and rest for another application, this makes fps more stable especially while streaming (its tested).
What they probably could do (at a higher cost) is to put the cache under the cpu. It would mean that the silicon "padding" would need all the wiring, at probably a few electrical issues, but they could do it. And it would mean that hottest part would be closer to the cooling, it would reduce the heat issue a bit. But probably not enough to warrant the extra cost... What I don't get is why not place the additional memory of the IO die, as an L4. Sure, it would give as much of a perf increase in a single CCD, but it should solve most of the heat issue, and probably allow to clock to stay on par to their counter part without L4. So the latency would be greater to fetch from L4 then the extended L3, but still faster then system memory, and the cpu would go as fast as they can, so it would would probably be faster in some scenerio, and slower in others. (Less slow down due to clock reduction, less gain due to cache...) Add in a few DMAs, and could be used a unified cache... And could even probably be used as a "cheat" to prevent some data going through the CPU when transfering from pci-e to pci-e card, where they can't already use one of the cards DMA to prevent that... Add to that some few more features and you get a competitor to Intel DSA... If ever they decide to do some G, the L4 cache would make a bigger difference I think, as it could be used to buffer everything that goes between CPU and GPU on a high frequency (compute buffers and the likes) Intel had a line (Iris Pro) with just a little bit of eDDR as L4 (128 mb), and it worked wonders on some scenarios. Some great perf increase especially when sharing data between the iGPU and the CPU... It was eDDR, not cache, so higher latency and slower (and cheaper) I'm sure that if AMD removed 16mb of L3 from the CPU die, and added 32mb to the iodie, by default, the CPU would be close in perf. However, it would mean having a cache controller logic on the IO die, so more cost... And without full DMAs and maybe "DSA" like features (with an API can could be used by the OS) I don't think that change would be worth the additional cost. At least, until they start adding more co-processors to the CPU (hence why we do see that "System cache"/"Last Layer Cache"/L4 on ARM processors more often.
This is why we wait for independent reviews to come out, particularly for games and applications we as customers will be playing/using. I'm looking forward to seeing what the 7950X3D can do, and whether I should buy one.
I really really wonder why they don't stack the CCD ontop of a cache/SRAM die. I don't know if that's because they'd need way to many TSVs from the CCD on the top to go through the cache die to the substrait or what, but the thermal issues could be fixed if they managed to do this. Even if there needs to be dead zones for TSVs aside from cost, I imagine if thermals from the SRAM chip stacked underneath isn't a problem it would be a superior solution. No more worrying about the thermals being suboptimal, and you could have a larger SRAM cache underneath, or have dedicated blocks for the core that's ontop for the edges of the SRAM die, and then have centre remain for extending L3. Also, if they stacked the cache die underneath and made it standard, then they could actually shrink the overall size (ignoring Z height) of the chip if they could move the caches onto a separate SRAM die "layer", and then start having dedicated function layers, with the most thermally constraint ones being at the top of the die touching the IHS.
Excellent analysis as always! I am looking forward to 3D v-cache as another Zen 4 option we'll have this gen alongside X and non-X variances. Glad you explained trade-offs doesn't necessarily have to be a bad thing. Everything in life comes with trade-offs. It's choices and context that makes them bearable.
@hawky2k215 yeah, but the 3D vcache is way better than I had even imagined a yr ago.
It's among the best thing to happen in the cpu space in a while.
I think this was answered in another video. 3D stacking reduces 2 picojoules of power down to 0.3, so stacked or not you save 6-7 times more energy.
I enjoy how you communicate these ideas. It's very clear, concise, and informationally dense without being cluttered. Very interesting stuff! Nicely done.
Exactly, facts and concepts without hype and unfounded expectation. Great job!
Because he is German/Austrian. I can relate ;-)
@@elijahjohn4482 Perhaps, but I think there's more to it. I think he's just a good communicator. As someone who also speaks German and French (English is my native language), just being German doesn't automatically mean clearer communication. This is more a credit to the person himself. :)
3:20 I am greatly reminded of the alleged quote that "640k RAM ought to be enough for anyone".
Your analysis makes perfect sense and for gamers, the trade off has been well worth it, specially on very cpu bound multi player titles. If AMD is able to keep the price to performance value at the same range as the 5800X3D, it will further consolidate its position as best value part for the segment.
Really good analysis. A lot of tech channels keep talking about how AMD fixed Zen3D with Zen 4, but never stop and think about the details. I'm sure the CPUs are gonna be great for gaming, but I fear the hype and misunderstanding of the new 3D Chips are going to "disappoint" some people.
I would like to reiterate previous comments and emphasise how much I appreciate your explanation of x3d chip structure and the differences between generations.
I have yet to see this information covered elsewhere. It seems you have more than earned our subscriptions.
The one reason I’m buying the X3D chips is because the 4090 is bottlenecked even by a 13900k fully oc’d in some games almost by 10-20%.
Only the supercharged cache X3D Chips can hope to remove that issue.
And the dual CCD design is really cool.
Because I’m actually really interested in testing stuff.
Not at 4k ultrawide it's not.
No es 4k Hahahaha, Intel es único cualquier usar gran CPU más rápido de mundo para entrar 4k, 8k y también 1080p
@@maegnificant yah but 4K is still too high for most games and 1440 p 270 is super smooth and best of both worlds.
You have a 4090 to play at 1080? LOL
@@MicaelAzevedo my brother in Christ I literally said 1440p 270hz XD. Like bro, please read for 2 seconds.
As always, this was a great analysis and I completely agree with your conclusions. There are tradeoffs on engineering. And cost/power draw are not the only possible disadvantages. That's why it's important to understand your workloads before choosing the hardware. Zen 3D it's not a premium zen piece (it costs more, so it is in that sense) but it's not automatically better. But having trade offs doesn't make it a failure and it's not all doom and gloom.
7:37 not necessarily. The Tjmax is a value that means the maximum temperature the cpu allows the sensor on the die to observe before throttling/safety measures, right? But it's a heuristic based on the placement of the sensor. If the sensor is moved or if the die layout is different relative to the sensor, it will read a different temperature value than another sensor. The engineers offset this value so that it can be used as that heuristic to safely throttle and protect the die. In other words, a sensor being moved to a place that reads a slightly lower temperature at the same actual die temperature means they have to lower the tjmax value as well in order to represent the actual die temperatures at the hotspot or safe total temperature. So a lower tjmax doesn't necessarily mean earlier throttling relative to heat generated/heat dissipated.
The TJmax difference could also be from the public reaction to the first chips running at 95C. They've already said that they lowered the power target to 120W because of public perception of the 7950X efficiency with the 170W target.
89C looks to me suspiciously like "just under 90C". The 7950X actually has a hard temperature limit of 115C I think. You can set this in BIOS if you want to.
89C looks a lot like the second digit is the deternimating digit, so, precisely 89C, where as 90C could be "around 90", as the 9 would be the deternimating digit.
I feel like vomiting when I read those numbers. But I guess it's all still good and "cool" as it's not 150ºC yet.
@@GholaTleilaxu CPU temp really doesn't even matter, it has nothing to do with heat output and everything to do with thermal transfer. 13900ks runs up to 105c and uses up to 350+w under load. The silicon can handle it just fine, the difference is high temperature packaging, the substrate and underlay glue. The high temp packaging costs more so was not normally used in desktop CPUs, only laptop CPUs. These new generation desktop CPUs are using the high temp packaging so it really shouldn't matter. As for heat output into your room/case, that's purely down to how many watts the chip consumes, watts are watts, regardless of the temp the cores are running at.
@@PineyJustice The second law of thermodynamics matters to me, especially during hot summer days.
@@GholaTleilaxu The second law of shit you clearly don't understand. There is a large variable you're missing, thermal conductivity through the IHS. If the IHS was made of wood the cpu could run at 200c while putting less heat into your room than a ryzen 5800 running at 40c.
Planning to buy a the 7800X3D if reviews show good boost over 7700X. I am concerned about the dual CCD models, but who knows, maybe they'll be good! I bet you could undervolt the 7800X3D to get stunning performance/watt figures. Can't wait for reviews!
Then you dont need an X3D chip if you need to see reviews,people who want that chip will buy it knowing that it will boost their Sim/Emulation workloads by 40%+,not to play far cry at 7% more fps.
@@Kevin-fl7mj Wait... so you're saying it is 100% day 1 buy regardless of overall performance, performance/$, or cost? ... ... Or that I shouldn't buy it because I don't play sim games? Or I should start playing sim games and then buy it?
@@paulw7738 What I am saying is,for average AAA gaming you dont need a 7800x3D unless the upgrade somehow costs you 50$ since it does not provide anything you cant already do with a 7700X,going from 165fps to 180fps on a 4090 at 1440p wont change your experience,where as with games like DCS/Star Citizen where you get regular drops below 60fps on top end hardware V-cache is amazing with gains over 40-50% in some cases.
@@Kevin-fl7mj sure, that makes sense. But for the games I play that are severely CPU bottlenecked, the question is, are the performance gains on 3D vs 7700X worth additional cost when I go AM5? Also curious how much performance is lost in non-gaming workloads due to lower clocks. What will MSRP be? How about temps? OC support? Lots of questions that will be answered when reviews come out.
@@Kevin-fl7mj True, i’m a big Hearts of Iron 4 and Europa Universalis 4 players and i’m buying the 7800x3d to help in “late game” where the cpu has to handle thousands of units in real time. Needs cache like a mothertrucker.
Would be neat if they could try and place the 3D V Cache under the CPU so that the hotter silicon is closer to the top instead
Yes idk why they arnt ud think it would make the most sense
Holy moly, Amd did exactly what u said with the release of the 9800x3D
One point about the TDP value that I’m very iffy about.
GamersNexus has mentioned that the TDP value AMD provides is based on some arbitrary formula, which they can tweak to land on a specific TDP value, and isn’t important in the real world.
The formula is:
TDP = tCase°C - tAmbient°C)/(HSF θca)
Based off of this, they can define what the case and ambient temperature is exactly to land a number that they would like to market. Maybe something you would like to further look into? Would like to hear your thoughts on this!
Very nice and concise way of explaining these new CPUs. Subbed!
I hope 7950X3D will be a good one, as I am considering replacing my 5950X to get some extra gains out of my RTX 4090.
But if it dosent live up to my expectations, I will wait for upgrading. The productivity performance of the 5950x is still good and DDR5 and AM5 boards aretill quite expensive.
Tbh I think I'll be holding onto my 5950X for at least another generation or two. The 7950X3D just isn't quite the upgrade I was looking for yet (the poor heat transfer on both the normal models and the X3D is especially concerning).
There is no "poor heat transfer" this a decent heat transfer method considering the architecture of the chip.
@@juicyhdx782 the entire point of this video is that AMD has to meaningfully drop the clocks on these chips in order to keep them cooled. Not only that but on the normal models, the IHS is extra thick so these models could have the same z-height and removing that IHS or milling it down have been shown to rather dramatically improve the performance of the 7950X because it would love to draw more power, but the IHS is keeping it thermally constrained.
it all depends on which gen you are coming from. usually it wouldn't be X% coming from the previous gen that you would want. If someone wanted more than % better.
One thing that could solve the thermal issues for x3D chips entirely is if they could manage to put the additional cache under the existing CCD rather than on top of it. I don't know how difficult this would be, but I see no reason as to why it would be outright impossible. This would probably require a lot of re-engineering if you start from an existing CCD design, but if the cache-under-CCD usecase were to be considered during the design of the Zen 5 CCDs it might be feasible.
LOL, I just watched a Doug DeMuro video right before this. I'm sure he would be thrilled to learn that "Quirks and Features" is now part of German tech reporting. :)
Something you've not touched on though is that the faster l3 "3d" cache with lower latencies will counter the higher reduction in clock speeds for the 7800x3d models and offer more of an increase where the clock speed reductions are lower (vs zen3 x3d).
One thing not mentioned between the two generations is price...
Is it possible that the X3D is less expensive ?
At the time I made the video the price wasn't known, but in the meantime AMD has announced it.
I made a community post about it: th-cam.com/users/postUgkxv1O_RcZIvii-E0maCyu1iF7mYGOnvH57
I recall hearing about a chip cooling tech involving "pillars" going through the chip that would help with 3D layouts, maybe something like that can be the solution for X3D's heat dissipation woes?
I sort of wondered why they didn't install copper or other metal pillars through or athwart the chip to dissipate heat...but I'm not a chip engineer and figured I was just dumb.
@@BronzeDragon133 Yeah, the TSVs are probably copper? They should be much more thermally conductive than silicon. I wonder if they could put a bunch of extra TSVs through the cores to pull more heat out.
Another great video High Yield. In answering your question as to what the future of 3D stacking looking like and how will we get around the issue of stacking heat sources, the answer is on die micro channel liquid cool. To properly realise 3D chip designs TSMC have been testing Direct Water Cooling, where the water channels were etched directly into the silicon layer on top of the CPU. In there tests, silicon channels were etched into a silicon layer with a silicon-oxide thermal interface material between the microfluidic system and the actual silicon of the TTV. In a third option, the silicon-oxide TIM was replaced with a liquid metal TIM. The results were really impressive and could enable more than 2000w chips what are still properly cooled if we ever needed to these levels.
I've got my components for more than a month and waiting for the 3D V-Cache CPU, despite the trade off.
Even if it runs slower than the normal X variants, CPU of 2022 and 2023 is already pretty blazing fast, even if it's a bit slower.
So you are out of the 14-days or 30-days return window in case one of those components fails to work or works unacceptably.
@@GholaTleilaxu That's true, but at least the company behind the components usually have very long warranties. So I'll just make use of that.
Its odd we still get no Releasedate or Price so far. Im really looking forward to the Reviews and if its worth to upgrade to AM5.
Sounds like they need to entirely re-evolve the thermal transport within the stacked layers instead of just relying on a massive top mounted sink to take care of it all by itself. Microscopic on die heat pipes? Electric nano-scale heat pumps? Evolve the form factor for dual sided active cooling? Less sexy but probably easier in the short term. The future will be interesting.
Best part about the X3D architecture is that it gets way more performance out of the same CPU and memory speed.
The 5800X3D with 2666MHz DDR4 performs close to the 5900X using 3933MHz DDR4.
Even if the 7800X3D performs midway between the 7700X and 7900X on 4800MHz DDR5 that's still amazing.
You could in theory SAVE money over getting a 7700X since you don't need to worry about much more expensive RAM.
It's even better if you already had slow memory because you wanted to save money earlier.
This happened to me by chance when I made a system with the athlon 3000G and 3000MHz DDR4 three years ago.
I just needed to get the 5800X3D, and the much larger cache took care of the rest.
If only GPUs were affordable I'd gotten one of those as well, but I can wait another couple years.
Processor manufacturers would rather choke themselves than give consumers a platform with 4 channels of DDR5 memory. Although physically 4 memory sticks are connected to almost any board. No, they will play with caches, introduce "new" generations of processors, selling us the old solution again and again.
If you want 4 or 8 channels of memory, you can pick it up whenever you want. It’s readily available. You’ll just pay for it. It’s in the HEDT, workstation and server parts.
But I would offer you an alternative understanding that it would not be as worthwhile as you think it is, unless you have a use case that is that memory intensive.
Both are very expensive. On board cache is the most expensive memory there is. It takes up a lot of space on a die. The larger the die, that means fewer dies per wafer and the higher odds of error hitting a die. AMD pays TSMC per wafer; the yield is the yield.
A DDR memory port also takes a lot of die space. In this case it’s probably on the IO die for AMD. Adding 4 channels might increase bandwidth but what about the people that just buy two dimms. Then you have 2 memory channels entirely unpopulated. Additionally the tracing to all 4 has to be identical.
I think a major thing you are missing is HOW much faster cache is vs system memory. It’s not even in the same ballpark. To a modern CPU which can perform a dozen operations in the time it takes for light to travel from your desk lamp to the desk, going out to main memory is a literal eternity.
Say you do one math problem a second, but sometimes you have to look up a value. if you have cache hit in L1 you had to wait like 5 seconds to get the answer. If you miss in L1 and L2 and hit in L3 you probably had to wait about 30 seconds. That’s a long time- it just cost you 30 other points while you were waiting.
Now if you have to go out to main memory, it’s like sending someone from the desk you are working at, across town to get the answer and drive back. It’s more like 2 hours you can’t do anything. And the processor is stalled until it gets the answer back.
If you have to get it from disk to read something, it’s like waiting 2 years for the answer.
On die cache is so much faster it’s not even in the same ballpark. The larger that cache the less often you have to burn 2 hours going to main memory.
That’s actually what hyperthreading does though. It allows a processor to switch to another process while it’s waiting for the first process to continue. at least it’s saying busy. It will continue on that until it gets stuck and swap back and forth.
But getting data from main memory is super, amazingly slow.
4 channels of memory doesn’t have much point until you have a lot of threads and a lot of memory utilization. The limiting factor is often not having a free ram request line open, it’s the time it takes to make the request in the first place.
Servers may have thousands of threads doing memory intensive stuff, then 4 or 8 channels makes more sense. In the consumer space it adds a ton of cost, takes up a ton of board real estate, and increases the consumer cost who now have to buy twice the memory and have to pay more for the more expensive motherboard.
If you want 4 channels you can get 4 channels. No problem. You don’t need special permission. You’ll just have to pay for a platform with 4 channels. And you’re probably looking at 4x-10x the cost.
I could see the thermal issue be solved if there were some process changes that introduced the equivalent of heat spreaders in the bulk silicon, so there are less issues with hot spots.
why did you take out the 7900x3d and the 7950x3d in the clock speed part? those two have the same boost clock as their non x3d counterpart, so your conclusion that it is a reduced clock speed fails for these two.
Putting V-Cache on bottom could make it better, but it may make interfacing CPU to substrate a nightmare as it would need to go through the V-Cache
I've been a computer tech for well over 30 years and I have worked in engineering (lithography) for decades as well. Core frequency is only *ONE* of several factors in efficiency. So, when I hear people speak so heavily on the frequency as a deal maker or breaker, it frustrates me deeply. Cache, bus speed (communication between the processor and the ram) are insanely critical. With 2 chiplets sharing the total core count like the 16 core AMD procs, the Infinity fabric is immensely crucial.
But losing 400 or 500 MHz on boost or even core frequency will make almost no difference. Especially when it is only ~10% loss compared to other models. Ram configuration also matters. (Dual channel ONLY for gaming or you lose perfoemance). Quad for production software (as quad channel is huge for increasing production software performance).
Also, the lowest CAS latency on the higher end of frequency. I am buying the 7800X3D and because the MAXIMUM and most OPTIMAL dual channel support it has is 5,200 MHz I went with G.Skill
DDR5-5200 - CL28-34-34-83 (83 being the lowest clock cycles to succeed a check out of all DDR5 ram kits). The cycles to successfully check the data is the most significant number in ram.
Who is talking about frequency as a deal maker or breaker?
Faster CPU isn't my gaming issue right now, Try internet speed and packet / data loss, Australia needs a big fix and it isn't coming any time soon, so my 7700X is the sweet spot and any further development from AMD is just a pipe dream. Low latency Then higher FPS is what is needed. But I'll be watching x3D closely over the next few years. Love the breakdown, thankyou.
As a software developer I would take more cache any day with trade off of some clock speed. My reason is writing cache friendly high perf software is very costly and at the same time mental demanding. Data structure size, memory layout, cache line padding etc is just too much for most dev to the point they don't bother at all.
the top of the line up 7950x3d will have 5.7 ghz boost. Still its a few percent either way. But vanilla 7950x has same boost clock. Also, u have to consider that not all CCXs will have 3d v cache.
I'm really looking forward to 7950x3D because when you run 128GB of DDR5 you have to run them at lower speeds, so I'm hoping the added cache compensates for this in games.
If u need 128gb ram u should use TR
@@hristobotev9726 Naw, that more than doubles the price, and I want to game with it too.
I liked your comment because its clever, but i would still reccomend to run a dual ram imagine what kind of latency you would get then
I just hope they will remember that the boiling point of water is 100ºC not 212º of the other obsolete scale.
Excellent, concise overview of the current and incoming state of CPU hardware from AMD currently. It really is excellent that they can provide an option for such a massive change in hardware for improvements in various workloads / gaming tasks.
One additional thing that I think should have be specifically addressed in this video is how the increased IHS depth from Zen 3 to Zen 4, done for cooler compatibility purposes, further exacerbates the heat dissipation issues.
For future considerations it's important to note that TSMC's ability to manufacture parts using multiple nodes will depend on the availability of ASMLs machines for older nodes. If it is found that specific nodes are indeed optimal, we should expect to see more expansions of current foundries as opposed to new facilities being built out for singular new nodes.
Another great video! I really like the Rumour Mill Roundup segment. I was surprised to learn that there was only one stacked cache on the 7900X/7950X, but at the end it makes more sense. While I think it is also N5 on N5 as well, apart from packaging design, I wouldn't be surprised if it's something like N7/N6 on N5 knowing how much it has improved. Thermal limitations haven't really changed and the stacked cache still needs the CCD to downclock on the frequency. With how AMD defines "TDP", it proves that the 3D cache still needs a lot of thermal dissipation, and the Tjmax also proves just like the 5800X3D how much power it'll be pulling compared to the non-stacked cache.
While it still concerns me how this second generation of 3D stacked cache has improved, the benefits will still please gamers. The only Achilles' heel for AMD is once again software support. I believe that is why they were still holding back on some freq information. Although Intel has proved this can be done, I find it interesting how different AMD will be kind of doing the opposite, with the "P-core" for gaming clocking less and having cache, and the "E-core" clocking higher without stacked cache.
Very interested to see the tests, especially of the dual chipset variants.
My suspicion is that the leaked performance might have been exceptional due to pre-production memory controller and secondly apps known to benefit greatly from cache.
Still I'll err on optimistic side, Zen4 has had work for more front end throughput and I expect the asymmetric CCD choice was a way to have fast ST with a pool of threads able to share a large cache for normally threaded applications.
I suspect asymmetric dual chiplets were tested on Zen3 before Zen4 was finalised after they found dual V-cache wasn't delivering in consumer review benchmarks.
V-cache doesn't scale with process shrink, they chose 6nm for Radeon MCD so my guess is they bond 6nm V-cache with the 5nm L3 V-cache vias.
What's the point of using a more expensive node when the density and power efficiency won't be improved?
could also be just costs. maybe they need to use better chips for this to work out and they didn't want to eat into their server sales, so decided that one was good enough for the consumer marker.
IIRC, Lisa Su showed a "5900X3D" in Summer of 2021 (at Computex?), but then AMD only released the 5800X3D. I'm sure you are right, they must have tested it with Zen 3.
@@NaumRusomarov possibly but if dual V-cache was a big benchmark & efficiency win the higher cost would be fully justified.
Server sales being cannibalised is more an argument against Threadripper than desktop.
i am so excited to get my hands on these. Finally upgrade from 7600K. My first move to team red! My CPU has been my bottleneck for many games now.
I have seen concepts that provides microchannels within the chip designs to allow more adequate cooling of these transistors. As the technology moves towards multi layer 3D stacking, I imagine that will become a necessity to allow the chips to be well cooled.
A modern chip can only run at 20-30% of it's full capability due purely to thermal constraints. Rock doesn't conduct heat very well. Increasingly large parts of a die are already dedicated to structures that support heat transfer better. Modern architectures include of ton of these heat channels and more.
You can't really draw conclusions about heat from sensor data, unless comparing CPUs in the same generation. Sensors can be in different places, the die could possibly take more heat and a host of other issues. So while what you said is a decent idea about what happening with TDP and Tjmax, it's not fact. We will have to wait for what AMD says to know what they were dealing with in the lab.
Also I'm going to disagree about the power distribution you showed for single core vs. dual core chiplet CPUs. With Zen 3, both single and dual core chiplet CPUs had the same power cap. That's not true anymore. The dual core chiplet parts have 170W TDP so each core can run hotter than previously, and dual chiplet parts can use a bit more power when running all-core loads. What you showed is basically true if NOT running all-core loads, and in fact you'd have to have the CPU running roughly 60% util. or less for that to be true. In GAMES what you showed is basically true because a 16 core part is not going to get to 50% CPU util, and the scheduler will throw threads at both core chiplets so in that case the heat is distributed better.
So this is a case where you can't compete Zen 3 to Zen 4 because with Zen 4 the dual core chiplet parts are allowed to consume a lot more power.
Just look at the thermals of x vs non x. The heat is why they don't want them overclocked. Great video! Very informative
You're absolutely right. I love my 5800X3D, but there are definitely tradeoffs. My 5800X3D is watercooled, yes, full custom loop with an EK block, to keep it at the maximum boost clocks I can get, and I do get great boost clocks of 4.4GHz on all cores at once. It still hits 4.5GHz briefly on single cores, but it stays a bit longer than on air cooling. It's not overclocking, but the extra cooling gets the best out of it. I would never go back to a chip with a smaller cache. My games run so much better.
Have you compared it to an overclocked 5.4GHz all cores 7700x with 6000c30 memory with tightly tuned subtiming? It's rather fast.
Have you undervolted yours yet. I have my 5800X3D running great at a flat 1 voltage and under a Aida64 test it never goes over 62*C on a 360 AIO @4.45Mhz all cores. Must the time in games like Satisfactory I never see over 46-48*C. They really do love undervolting!!
@@TdrSld Yes, I instituted a -0.10v undervolt in the bios, which puts it at 1.2v max and about 1.05v at idle. Makes a heck of a difference. Under 35C idle, usually around 32C, max 66C under heavy load, 45C while gaming, and constant 4.4-4.5GHz clocks. It really helps. I use the full custom loop more because of the GPU, and add on the CPU just for the convenience.
@@dangingerich2559 it's great how well the 5800X3D responded to undervolting. Right now seating in a 75*F room playing Satisfactory the CPU seating at 37*C. I believe the 7000's units will do better.
@@TdrSld Yeah, the current 7000 series would probably do better in many ways, but the lack of cache would hurt gaming performance in some games, notably the games I play. Plus, there's the massive cost involved in replacing the CPU, MB, RAM, and waterblock, that would exceed $1000 for my rig. The minor increase in performance simply wouldn't be worth it. Gonna keep my 5800X3D for a good while longer.
I unlocked SAM for my 3990x which has an L3 cache of 256mb! loving that cpu! it keeps unlocking surprises!
Does water cooling help performance much? Not AIO but how are bespoke water cooling loops performing vs air cooling and continuous boost clocks?
This explains why in AMD's announcement address, they talked about using a 280mm AIO to cool the X3D chips. I wondered why they didn't mention air cooling, and only talked about water cooling. Perhaps the new X3D chips may actually require liquid cooling? Time will tell.
Excellent analysis as always, really liked the intro where you went over the rumors and if they were accurate now that we have the official specs. As for the thermal issues posed by any die stacking method, I would think if AMD is making sure the stacked cache only covers part of the die, they could use something more thermally conductive than silicon above the cores themselves to compensate for the loss in having the ihs contact the die
I think they use silicon as it has same thermal dissipation, if they use metal it could cause some cracking etc.
@@kecimalah I wondered about this. It seems 2 small pieces of copper would be far superior to pieces of silicon, but if the pieces are bonded to the silicon below and not just "sitting" this makes sense.
7900x3d and 7950x3d will benefit quite a bit from using process lasso to schedule the threads, specially the 7950x3d
I know it's stupid, but I had issues with my dual CCD 3800X in the past (stuttering and single thread performance reduction) so I'm not getting anything dual CCD until it's proven mature for at least 2 generations.
Imagine if it was possible to instead of making the silicon flat on the horizontal axis and therefore the additional 3D layers horizontally stacked we could make it vertically with an IHS that cover the left and the right side, this hypothetical way would not decrease at all thermal performance on the first stack as they would be facing opposites ways that can be cooled independently, if we have two layers at the trade off of slightly reduced thermal performance we could archive the same decrease with a 100% more silicon with 2 layer on each side giving us a total of 4 layers.
This technically could be even more complicated if we had a cubic 3D system where we could have 3 sides with all the cooling capabilities although with a caveat.
You could have 2 vertically dies and 1 horizontal but if you want the 3 to have the same area that would create a pocket in the middle with the least thermal capabilities, I suppose it could be use for like an IO inside chip that requires a lot less cooling but is too much empty space, specially if we were to grow the size of the top and side does, as the difference between surface and internal volume differ exponentially as surface increase to the power of 2 and the volume “unusable” would grow in the power of 3.
The best solution for 3D silicon I can came up with would be using the first idea of vertical silicon with layers in opposing layers AND having multiple of them in the same CPU/chip as if they were chiplets but instead of an array of horizontal “plates” we could have an array of towers of silicon all over the substrate, and if we managed to create a thermal solution that may or may not be part of the IHS that goes from the base substrate to the top of of the towers filling the gaps we could create a thermal solution that when applied to that system (if the on substrate solution we just talked about had enough thermal conductivity) would cool all silicon dies/layers equally as good or with very minor differences between the parts close to the substrate and the ones close to the cooling element on top, With this system I imagine that we could have Cubic looking CPUs or SoCs with hundreds if not thousands of layers pretty well cooled.
Maybe the easiest way to visualize this would be to imagine a CPU die grow it vertically as much as it is horizontally and slice it (like with those tools that can let you slice a fruit or what ever in a single move) and in regards of how to move the data to the substrate, between two opposing layers of silicon we can have a very thin copper or something wiring
Another way for the best 3D chips would be cubes of silicon designed leaving a system of microscopic pipes to use an on die water cooling, water cooling on die is actively being researched nowadays, I don’t think it was with a tube/pipeline system I think it was more like surface structures on the silicon to increase surface area for the water, but this could evolve to a pass through system that I just mention
On other multi-CCD designs from amd a CPU core will actually check the L3 cache on other CCDs before going to system memory, meaning it could sort of act like a pseudo L4 cache for single threaded applications. Theoretically this means the higher clocked cores in the new dual CCD X3D CPUs could still benefit from the 3D V-cache on the other CCD with only a slight latency increase (still a lot faster than going to system memory)
This channel always gives a solid analysis. Great work.
Great video, thank you! The problem of power density is one of the 2 most important problems in CPU design. The other one is that price of single transistor start rising with new technological nodes. As you have said it's physics limitations. Looking at the data of power density (W/cm2) for all the CPU's from the start of 2000 you'll see that no CPU has beaten 120W/cm2. Todays processors are most dense in that sense. And the second runners are... Intel Pentium 4. Pentium 4 microarchitecture was designed for high clock speed, and therefore high power. Reaching about 100W/cm2. This also explains why M1/M2 is so much powerful and competes event with desktop chips. Apple designing their chips around performance per watt from the beginning because of limited power supply. But it was inevitable desktop becoming cooling limited. So nowadays 100W/cm2 is all you have. At least while we are waiting for GAA-transistors or some other "miracle material" to control leakage.
I assume the 'no clock speed regressions' was back then because leakers heard that the max boost was the same on 7950X and 7950X3D (considering same power being used for both). But that was because one of the chiplets didn't have V cache, which no-one thought about at the time
heat transfer isn't really affected much, it's the target temperature the cores ae designed for/set to throttle to that is lowered and thus limits how high the 3d cached die can boost
As for the cache chiplet, i'd strongly assume N6. As long as you consider the maximum TSV pin density per area that N6 allows when designing the base die, stacking is not an issue. This may be an explanation for 3D chiplet boost clocks (besides core voltage limit and thermal considerations) aswell, since AFAIK the L3 also runs at core clocks. As seen with Zen 3/3+, the max possible speed for AMD's L3 implementation on a N7/N6 process is around 5Ghz, which just also happens to be the 7800X3D's max clockspeed.
For me being on Zen 2, not wanting to upgrade to AM5 just yet and seeing the 5900X and 5800X3D at the same/similar price point, it's hard to decide between them. I see greater performance on the 5900X for the multithreaded workloads I have (video encoding, blender, code compilation) but the 5800X3D would give a greater boost to gaming. On balance, I probably care about gaming performance more than shaving a few seconds off those other tasks but maybe 7950X3D/7900X3D will be the best of both!
I've been using a Ryzen 5 3600 up until October, when I upgraded to the 5800X3D (which I bought at launch but never installed it... silly me)
TBH, the 3600 was still more than capable for everything, even editing my YT videos in DaVinci Resolve.
If you need more performance, I'd go with the 5800X3D, simply because I'm a sucker for 3D stacking, but with the current direction of CPU pricing, maybe wait for a good AM5 deal?
@@HighYield The downside to AM5 is needing more than just CPU (motherboard, RAM, potentially a case and PSU). I play a lot of strategy and sim games like Factorio, Civ6, and the extra cache seems to help those more than others so I think the 5800X3D will be best for my case.
Good job on the videos, BTW. It's nice to have some analysis instead of constantly pumping rumours and speculation as on some channels. :)
Keep up the great job on the videos!
I'm always looking forward to your analysis videos. They are really interesting.
Thanks for the compliment & thank YOU for watching! :)
Wonder if they will ever be able to stack more layers and add another layer of cache, not sure if that's possible to cool tough, as 3dvcache is already problematic to cool.
Well RDNA 3 has 3-levels of infinity cache stacking in their labs, so it's possible. Like you said it's the cooling the would be an issue. But for Genoa-X I could absolutely see this being used because of the lower TDP/chiplet
AM5 is just getting started, with eco modes, 3d caches, and even more cores. now we need the gpu space to get its act together.
I think that L4 type of cache like SPR with HBM could be also a great idea.
Maybe Zen 5 will introduce some kind of "L4" or "Last Level Cache", I'm looking at Apple's SLC design.
Glad to be subscribed when you are sub 10K. Keep the good work!
VERY GOOD. There are too many people who think you throw on more cache and everything is better. Well, no it's very dependent on the program AND the data, how it's accessed, if you store it, is it a data stream or just a huge set of data that keeps changing like in games, or different engineering modeling where data sets can be very large that you're actively working with.
The first thing I said to myself when I saw the specs for Zen 4 is L2 is doubling. I have to think that's going to reduce the benefit of L3 because you'll have less misses in L2. To me that's a no-brainer. And for the games that didn't benefit with the 5800X3D the same will be true for Zen 4. It's once again all about the data, not the CPU. The data sets aren't changing so a CPU that has no benefit from added L3 in one generation won't get a benefit in a newer CPU either ESPECIALLY when you doubled L2. You already got all the benefit you can.
Having said that many newer games will have larger data sets with more assets that a user can interact with and more data about these assets needs to be maintained as close to the cores as possible when you're close to those objects.
And then there's the downside of adding cache, which usually adds latency to accessing that layer.
And no, a faster CPU doesn't make you GAME any faster, it just boosts fps IF the GPU isn't a bottleneck, or already hitting 100%. So no, the data you are working with at any point in a game isn't going to increase just because the CPU is faster. It's all about the game data and a faster CPU doesn't change that if you aren't moving through the game faster, and you won't be.
What a relief that at least you checked out that the dual ccd variants are only a eyewash related to boost clocks!
And you are right with them being the first hybrid big little architechture variants !
checked out? he said he assumes it.
They should put the cache on the bottom and then put something in the middle that draws the heat out.
upgraded to 7600x got fps boost at 1440p with 5 year old vega56, next upgrade gpu then buy 7800x3d later
I be fine for 4 years then or so
Honestly, no reason to upgrade to Zen 4 X3D with a 7600X, but Zen 5 X3D could be a nice upgrade ;)
@@HighYield No reason? but but its better and faster that means a lot even though zen5 may be a big deal but also a year away
Thick IHS is a challenge for current and 3D will be a bigger one.
Vain hope: someday, we'll get to the point where amd can use 3d v-cache the other way around: put the cache die under the cpu die for better heat management. That probably requires many more TSVs though, and other changes.
If only we would be living somewhere in space, far away from Earth's gravitational attraction...We would not survive for much time but what a happy, short life would that be, filled with 3D V-cache surrounding us on all sides!
The 7900x3d is what I’m more excited for because I think it will gain the most from the mixed chiplet design they went with. And how this works out will determine my excitement for when AMD decides to put a high density C die and the normal computer die onto 1 package
Very important argument against the 3Dvcache helping a lot in the 7900/50x3d is that the extra cache is all on one ccd and none of it on the other. The 7950x already loses to the 7700x in gaming cause of the bottleneck imposed by infinity fabric and having a lobsided cache is going to make this worse. Now the windows scheduler now has to figure out if a game process preferes 3dcache ccd or the clock speed ccd, you don't get both at the same time.
TDP = Thermal Design Power
Like you said, this just tells you how strong your cooler must be
The Reason the Dual Chipset CPU's have higher boost clocks than the single CCD 3D V-Cache is that the Dual Chiplet V-Cache ones only have V-Cache on 1 chiplet, the other is just standard. So the V-Cache Chiplet will run a lower boost speed than the non V-Cache one (we might see only 5.2Ghz on V-Cache chiplet and 5.7Ghz on the Non V-cache one according to sources.
This is further cemented because AMD is working with Microsoft to update the Windows Scheduler to prioritize the 2 Different CCD's. Gaming applications that will benefit from the extra cache will be run primarily on the V-Cache chiplet and talks that run better with higher clock speeds (video encoding and such) will be prioritized on the non V-Cache chiplet.
This will be similar to how Windows handles the P-Core and E-Core arrangement on Alder lake and Raptor Lake.
Sadly, AMD's TDP values are barely related to power and can't be used to deduct power consumption. They use the formula: TDP (Watts) = (tCase°C - tAmbient°C)/(HSF θca), so a temperature difference(that does not even include tjmax) and a heat flow coefficient. They make up the numbers to fit whatever they need them to. Gamers Nexus has a deep dive about it.
The only reason Zen4 supports faster RAM with lower latency is because of that 12nm IO die on Zen2/3 (if you notice, Zen+ and Zen2 had similar memory limitations because the memory controllers are both on GloFo 12nm)
If you use a 4000G or 5000G you can generally still beat all DDR5 kits suppoorted by Zen4 even on competitive OC boards for RaptorLake
This is because the memory controller is both monolithic on dye, and on TSMC 7nm.
I've got a 5700G with 75GB/s bandwidth and 49.1NS of latency, that latency is competitive to a 13900KS with DDR5 8000C36
(I'm running 4933C17-17-17-28 while in 1:1:1 timing ratio)
This is just not possible on Zen4, but because Zen2 and 3 APUs have 1/2 the cache of their CPU counterparts, this results in weird performance
In games where the 5800X3D accels, you're probably better off with a 5600X than with the 5700G with 4933C17 RAM
But where the 5800X3D loses to the 5800X non 3D, the 5700G with fast RAM will generally be even farther ahead in performance.
I'll be interested to see how fast we can get RAM on Zen4 APUs if they're ever brought to desktop, a 5nm IMC seems like it might support up to 7600C30 in 1:1 as long as you're using a 2 slot ITX motherboard and good RAM
I imagine, in the not to distant future, we see CPU's with the die(s) accessible from the top and bottom of the CPU. And we get motherboards that support this. Would allow us to cool both sides of the CPU. Currently, We are using only half the surface area to cool a CPU. So if AMD or Intel worked out the design of the cpu die(s), to allow them to be accessible from sides, we could nearly double our cooling performance. It would take a lot of work, But I don't see any inherent issues, except maybe having to make the die(s) larger, since the base can no longer connect to a pin, but the pin outs on the die(s) would have to be along the edge of the die(s). This would make current computer cases obsolete. But is that a BAD thing? It would give case makers room to come up with new designs. Same idea for pc cooling companies. The first few years would be super interesting and fun.
I'm not an memory expert but i heard that DDR4 has a lower latency then DDR5. "Compared to DDR4, DDR5 RAM has a higher base speed, supports higher-capacity DIMM modules (also called RAM sticks), and consumes less power for the same performance specs as DDR4. However, DDR4 still holds some key advantages, like overall lower latency and better stability."
So that would mean the 7000 series X3D would even more profiting from more cache then the 5000 series X3D.
At base specs yes, but DDR5 is build different internally (related to accessing the memory banks), and thus the true latency should be lower over all. But tbh, I need to dive more into this tech at some point.
latency means nothing without context. Let's take your typical 60 fps game or hack let's just say every game run at 240fps. Then naively speaking you would have roughly 4ms for rendering every frame. Then let's just say DDR5 is two times worse latency wise that take 100ns more for a single read. That would take up roughly more than 1/40,000 of your frame time for a SINGLE read. This is the reason why typically memory is marketed with their bandwidth per second because a single read latency does not matter (unless you are some kind of crazy use case where ns latency matters which surely not for 99.9% of users)
As a gaming CPU assymetrical CCD design seems just about perfect, the only way to make this better would be shrinking half of the cores into efficiency cores for improved overclocking on your main cores.
They could re-position the cache for increased latency in exchange for lower temperatures, but that would defeat their overall design goal. Overall, when it comes to gaming, the temperatures are not much of an issue really as long as the GPU is the final bottleneck in a given situation. There are outliers which are really taxing the CPU, like Monster Hunter did at the release, but they are few and inbetween.
My money is on AMD coming out with a much improved version of 3D V-cache in the long run. I'd bet my money on a mixed silicon and high-conductivity material for its next generation chips. Graphene could be it. I just don't know how the layer deposition would be done. But I'm sure clever engineers already have ideas on how to mass-produce such hybrid chips.
They should put the blank transistor free silicon on the bottom under and raise the good silicon.if possible I know there is a physical design issues but maybe they can do it.
I wish they'd use a in silicon vapour chamber...
There's techniques used in steam engines that can allow you to use steam and no other moving parts to pump water into a pressurized container using the pressure in that container to power the pumping.
And phase change should do a good job of removing excess heat during peak loads.
A moving fluid/gas probably transferring the heat faster then a solid metal would.
You could essentially make a "refrigerator" powered by the CPUs own heat and the temperature difference vs another part of the chip and edge it into the silicone between the 3D V-cache and the CPU itself I think.
Bro as a techie nerd myself I love ur videos. Keep up the great work
Why wouldn't they put the CPU cores above the extra silicon layer?
and could the layer under those cores be more cache?
Good video. As engineer, i totaly gree on every point you made. I dont belive it will be as good performance jump as last gen. Main reasons IMO are 1) 2x L2$ and DDR5. On cache (and RAM in general), depending on app it can be latency sensitive or bandwith sensitive.
You touched on it, but I wish you'd gone into more detail about the power sweet spot. As GN has shown, the X SKUs pump far more power to hit those high numbers. In performance/Watt "Eco Mode" or the non-X variants can actually beat the faster chips. The X SKUs are basically pre-overclocked for marketing reasons.
Running at a lower TJMax and closer to that sweet spot just brings them down to reasonable levels.
The other thing you may have forgotten to discuss was the sensitivity to higher voltages. Which is another reason Overclocking is disabled.
My 5800X3D runs below TJmax and boosting to ~4,25Ghz on 105W in Cinebench. With PBO Curve Optimizer (via a thirdparty tool) and -30 on all cores, i can get below 80°C with all cores running on 4,45Ghz with 105W. I use a silent 280mm AIO for cooling.
I had a 5950X before that. With all 16 cores running on 4,2Ghz, i had 65°C on all 16 cores with 142W. Or ~80-85°C with 4,5Ghz on all cores and >200W OC.
That said, the hottest it got was on single core boost when it boosted upto 5,05Ghz with 1,5V (stock settings). It got to 85-95°C with just 80-85W of power.
Technically both CPUs never hit TJmax in my case. It was either the voltagelimit or powerlimit. With Ryzen 7000 it is different, since they hit TJmax first above all.
12:55 if it were an N7 node for the V-Cache, they wouldn't have needed to mix the CCDs like that (with+without cache). I say both will be N5 because they're saving cost or manufactruring capacity.
For someone who doesn't understand this you're saying the 7950X will have better performance than the 7950X3D???
I gotta ask this cos ive wondered for a while
With the v-cache being TSV's why are the stacked on top, why cant they be layered on he bottom?
It could be possible, but then you have to pass all the power connections meant for the CPU through the chiplet, another design problem. It's basically finding the least problematic implementation.
well, it's not like the 7950x without 3d was able to boost to it's maximum boost clock on all cores simultaneously anyways
thank you again for this deep dive into future technology really explaining on a die basis how silicone is gonna go the next years! 👍
Around 3:35 , "DDR5 has higher bandwith and better latency" is not really true. The typical latency is actually worse at the moment. DDR4 3600 MHz CL16 has lower latency, than DDR5 6000 MHz CL36.
The situation is improving, but we aren't there yet.
Fix me, if I'm wrong here.
well glad I can do cl28 6200 on pretty much every ryzen CPU lol
@@BaBaNaNaBa This is still slightly worse latency, than 3600 MHz CL16.
7800X3D is overall better gaming CPU then all 13th Gen CPU'S except ofcourse 13900K, 7900X3D is true competitor of 13900K, 7950X3D competitor of 13900KS. There is even a video for 5950X and 7950X users for a tool to use first 10 to 16 threads for games and rest for another application, this makes fps more stable especially while streaming (its tested).
The thicker heatspreader compared to 5000 series doesn't help either with cooling.…
What's the original source of lecture from 2:35?
It's from AMD's YT channel: th-cam.com/video/dSCpVhKvmCY/w-d-xo.html
Extremely informative video. Thank you!!!!!
What they probably could do (at a higher cost) is to put the cache under the cpu. It would mean that the silicon "padding" would need all the wiring, at probably a few electrical issues, but they could do it. And it would mean that hottest part would be closer to the cooling, it would reduce the heat issue a bit. But probably not enough to warrant the extra cost...
What I don't get is why not place the additional memory of the IO die, as an L4. Sure, it would give as much of a perf increase in a single CCD, but it should solve most of the heat issue, and probably allow to clock to stay on par to their counter part without L4. So the latency would be greater to fetch from L4 then the extended L3, but still faster then system memory, and the cpu would go as fast as they can, so it would would probably be faster in some scenerio, and slower in others. (Less slow down due to clock reduction, less gain due to cache...)
Add in a few DMAs, and could be used a unified cache... And could even probably be used as a "cheat" to prevent some data going through the CPU when transfering from pci-e to pci-e card, where they can't already use one of the cards DMA to prevent that... Add to that some few more features and you get a competitor to Intel DSA...
If ever they decide to do some G, the L4 cache would make a bigger difference I think, as it could be used to buffer everything that goes between CPU and GPU on a high frequency (compute buffers and the likes)
Intel had a line (Iris Pro) with just a little bit of eDDR as L4 (128 mb), and it worked wonders on some scenarios. Some great perf increase especially when sharing data between the iGPU and the CPU...
It was eDDR, not cache, so higher latency and slower (and cheaper)
I'm sure that if AMD removed 16mb of L3 from the CPU die, and added 32mb to the iodie, by default, the CPU would be close in perf.
However, it would mean having a cache controller logic on the IO die, so more cost... And without full DMAs and maybe "DSA" like features (with an API can could be used by the OS) I don't think that change would be worth the additional cost. At least, until they start adding more co-processors to the CPU (hence why we do see that "System cache"/"Last Layer Cache"/L4 on ARM processors more often.
@El Cactuar Thanks for the correction!
This is why we wait for independent reviews to come out, particularly for games and applications we as customers will be playing/using. I'm looking forward to seeing what the 7950X3D can do, and whether I should buy one.
I really really wonder why they don't stack the CCD ontop of a cache/SRAM die. I don't know if that's because they'd need way to many TSVs from the CCD on the top to go through the cache die to the substrait or what, but the thermal issues could be fixed if they managed to do this. Even if there needs to be dead zones for TSVs aside from cost, I imagine if thermals from the SRAM chip stacked underneath isn't a problem it would be a superior solution. No more worrying about the thermals being suboptimal, and you could have a larger SRAM cache underneath, or have dedicated blocks for the core that's ontop for the edges of the SRAM die, and then have centre remain for extending L3.
Also, if they stacked the cache die underneath and made it standard, then they could actually shrink the overall size (ignoring Z height) of the chip if they could move the caches onto a separate SRAM die "layer", and then start having dedicated function layers, with the most thermally constraint ones being at the top of the die touching the IHS.