We're almost there... I have a miniPC with a 6900HX and I can play EVERY game at 720p... WHen StrixPoint comes out next year I think we'll be at 1080p60 using FSR3... I haven't seen any benches with 7x40 APUs but they should be 50% faster than the 6900HX... Especially if they can get 30% higher clocks... And then Strix should be 30-40% faster than that...
No, heat dissipation, leakage, and memory bandwidth, issues with latency. Just look at modern GPU's from nvidia and AMD way too costly. Physics isn't magic.
One correction I would mention is. The latest version of PyTorch. PyTorch 2.0 is moving away from CUDA. For ever increasing AI models CUDA on its own can't optimize for the scale. The frameworks themselves have to be optimization aware. This is why ML frameworks are shifting from the eager mode to the graph mode, which sidesteps CUDA (cuDNN) and provides better performance. Instead of using CUDA, they will use tools like Triton (this is what Open AI's ChatGPT uses) which interfaces directly with Nvidia's NVPTX and AMD's LLVM AMDGPU backends. So CUDA is on its way out and with it Nvidia's software moat. MI 300 will be a monster.
@@HighYield I believe so. There is a big push at AMD to improve their AI offerings. mi250 was kind of a test, with limited matrix multiplication units, and built more for pure scientific HPC applications (like the ones used by Oak Ridge supercomputers). mi300 will focus on AI. And I do believe it's going to be very competitive with Nvidia.
@@SirMo Excellent. AMD has been lacking in that market, and some competition against Nvidia's monopoly is welcome. I know they've had their CUDA compatibility layer for ages, but a proper alternative is always best.
Another blind fanboy having the amd cool aid drink, amd has zero chance in AI because nvidia is light years ahead with full stack ecosystem and partnerships, all amd can do is make chips but that wont take them anywhere, its all about the ecosystem and the advantage of scale where nvidia absolutely doimantes, poor amd was only able to catch on intel because intell got fallen behind TSMC in process node technology and amd used TSMC but with nvidia there are no low hanging fruits to eat because unlike intel nvidia is not sitting on its laurels they are moving ahead with light speed in AI and here amd is just warming up.
You think so? I am learning CUDA and working on 12.1. I am just getting started.... Just want to do some small scale work to learn this new tech. There are a ton of processors on the market for cheap that I would have been thrilled to have just 5 years ago. Did you see CUDA has a new communication system that is not only under development but already being used that reduces i/o bottle neck. I hope AMD does something great though to keep the competition going. Anyway gotta run.
@@beachslap7359 If Nvidia doesn't have chiplets with Blackwell, they're screwed. It's known they're working on it, but what they themselves don't know is if they'll get it done on time.
@@beachslap7359 AMD has the supply, they just don’t have the demand or market share. If there was proper demand, they would shift wafer capacity from their Ryzen and gaming gpus
@@coladict they have by far a more superior architecture right now compared to amd. Why would that change for the next generation just because they don't have chiplets? Especially considering the node jump is gonna be slim for both companies.
your videos are amazing and i learn so much new interesting information even though i don't understand everything its still rewarding to watch you explain and develop my own knowledge just as you did. thanks for that and greetings from hamburg, germany :)
Awesome job covering the mi300 it’s so impressive and beyond anything ever made no one knows how to cover it or even talk about it. The technology is going to make it into gaming consoles. I predict next gen is going to be so integrated the thought of adding ram and separate cards to a pc will feel ancient
1. Excellent video. I don't follow the data center innovations that closely, I'm more of a desktop gaming guy, so this video was absolutely fascinating to me. Well explained, well segmented. And it's exciting to think about what this will mean for the desktop for the upcoming decade. 2. Before you introduced the 5 new technologies, I paused the video and gave it a quick think of myself. I basically came up with the same categories. Except I came up with "heterogeneous design". In my head that was something that takes the SoC and disaggregates it into chiplets but also includes mixing process nodes and possibly chiplets made by different vendors / foundries. We're not quite there yet. But in my head mixing 5 with 6 nanometers is a part of it. So I basically mushed your "SoC" and "Packaging" category together and added a bit of my own flavor. 3. The classic 'von Neumann' architecture on PC can't keep up anymore. We see this with the consoles, how a smaller, much cheaper design can yield incredible performance. Mid to high-end PCs that cost 3 times more struggle to play the latest console game ports. This is ridiculous, somethings got to change. I'm curious how a next gen PC architecture will look like. Will we still have a modular design, how will cooling look like and will manufacturers be able to agree on a standard in time before consoles make the PC look even more boomer than it already looks to some people? 4. Exciting times ahead.
It doesn't really matter how you call it. Remember when AMD's company slogan was "the future is fusion"? It's kinda ironic that now that they achieved this complete "fusion", it's not their slogan anymore. But the technology has been a long time coming. Long before their Ryzen comeback, AMD was forced to innovate to stay alive. The design lead AMD currently has is a result of this. Fully agree, exciting times ahead!
I think you are spot on. Though I would say that another tech to look out for is in-chip fluid cooling. Heat is a huge problem, especially with 3D stacking. Efficient extraction of heat allows for higher frequencies and lower energy use (as heat increases resistance).
Incidentally when AMD was buying ATI for Radeon the "Fusion" idea of not just seamless GPU fp compute but a unified address space was used as justification. It's over a decade but at last this becomes more feasible. Not just MI300 but SAM and recent DX12 extensions are aimed at shared address space.
@@peterconnell2496 the bottom line was the APU wasn't that feasible, but now we see Intel & AMD strengthening gfx, while Nvidia are launching ARM CPUs for the data centre. So this memory unification is the direction for high performance. The old APUs were too compromised by cost limits, restricted memory bus, cache; despite AMD having a transparent NUMA in their server CPUs, they didn't have the investment funds to realize the possibilities. A sceptical observer suspects the justifications were rationalisation for a panic reaction to CUDA. IIRC ATI were in trouble, AMD wanted a slice of GPUGP, so the Fusion concept of another VLSI step was born. The problem was Intel had successfully responded to AMD64 and the x2 chips with Core Duo, had bribed the key OEMs and had a voucher system of rebates with Intel Inside that small dealers relied on. So AMD were squeezed from both sides, not able to realise the profit from their real innovations and NOT having the financial muscle to buy an OpenCL counter to CUDA of sufficient quality & application support. AMD were following and trying to catch up; Intel had gone awry with P4/Itannic but commercial power kept them strong. Nvidia meanwhile reaped the rewards of the collapse of competition with their new main competitor having to divert funds away from future GPU designs.
Something I wanted to add: since we don't know the packaging method yet, when I talk about the "interposer" it doesn't have to be a large silicon interposer, it might l be a small "organic interposer" like on Navi31, using TSMCs "Fan-Out" technology. Once we know more another video will follow! www.tsmc.com/english/dedicatedFoundry/technology/InFO
IIRC, TSMC is using CoWoS and TSVs with the caches like X3D... They renamed it to 3D Fabric for the whole infrastructure... AMD is the cutting edge of silicon right now... They had to sell their own fabs and came out even better in the long run... All their 5nm products haven't even come out yet and the 4/3nm Zen5 ones will have even more instrs and accelerators...
It's 100% confirmed to be a standard, absolutely freaking MASSIVE (as in it pushes TSMC's max reticle size limits right up to their absolute freaking er.... limit lol 🤣) silicon interposer. 🤷 AMD were originally planning on doing what MI250X did to connect its HBM, which is use a hybrid of a standard interposer ala MI300 & fan-out wiring ala Navi 31, w/ TSMC's "CoWoS-R" packaging, which uses a small silicon bridge + fan-out technology. (Think Intel's EMIB, but instead of the tiny silicon mini-interposer bridges being directly connected on both ends w/ physically die-to-die attached TSV's [which TSMC can now do too, called "CoWos-L"], they're connected w/ less dense fan-out wiring [although still plenty dense enough for HBM3].) But then at the absolute very last minute possible, AMD had to switch to a traditional absolutely gargantuan silicon interposer (aka using TSMC's "CoWoS-S" packaging) pretty much as late as they possibly could due to reliability concerns, thanks to the massive package flexing/warping when kept at its full-fat 750W load for seriously extended periods. Basically, with MI300 having a MUCH larger overall package than even MI250X, the tiny silicon bridges connecting the HBM stacks were simply much more prone to failure than a single massive contiguous silicon interposer across the entire package under the kinds of "100% load, 24/7/365" conditions that are inherently endemically common in HPC/data center land. 🤷
@@ChristianHowellIt's CoWoS, sure, but TSMC has like 10x COMPLETELY DIFFERENT technologies under that single banner, making it a mostly meaningless/useless label just by itself outside letting you know the product is multi-chip. 🤷 MI300 uses a classic single massive (literally reticle limit pushing) contiguous silicon interposer base layer (just like in say Vega), which in TSMC's marketing speak is called their "CoWoS-S" ("CoWoS on silicon") packaging technology. For an example, CoWoS-S/a traditional large silicon interposer is an ENTIRELY DIFFERENT packaging tech than what's in MI300's also CoWoS but rather "CoWoS-R" using MI250X predecessor! ("CoWoS-R" is a hybrid between Intel's EMIB tiny silicon interposer bridges [which TSMC also now has a proper version of, called "CoWoS-L"] & the ultra dense fan-out wiring tech used by Navi 31/2 & Apple's M# Ultra SKU's. Basically think EMIB style silicon bridges but connected on both sides w/ less dense but still plenty fast for HBM3 fan-out wiring vs EMIB's/CoWoS-L's direct chip-to-chip TSV's. They didn't end up using CoWoS-R again on MI300 like they'd originally planned to because of reliability issues w/ the tiny silicon bridges at its full 750W load for truly extended periods, as is the standard HPC/data center usage environment.)
MI300 is something I've been waiting for since I saw the initial HPC chiplet APU patent... The interesting thing is that older CPUs used to remove functions from the CPU die because of limited transistors at 180nm etc... But the biggest thing about it is that I heard some autonomous driving folks say that 2000 TOPs are needed for an FSD experience and since MI250X has 383 TOPS, 8X that is almost 3500 TOPS... AMD can now theoretically provide all the chips for automobiles out of nowhere it seems (NOT!!!)... They can use an edge like appliance with a Pensando front end for network relays for traffic and weather, etc. for a LARGE MAP area, while an upcoming Ryzen APU can do the entire system, including 4K video and gaming... Companies are selling mini PCs with Ryzen and Radeon 6000 that can do 4K30+... Zen4 telco servers can do edge processing while EPYC can stream games and all types of data including AI for predictive routing...
AMD is way ahead of the curve vs the competition. They just need someone to market the tech better. They are a true heterogenious system and get better and better every year. Now AMD is sharing GDDR with CPU / GPU and other AI accelerators.
The packaging/chiplet design is quite brilliant (speaking of, gratz on the sponsorship)! One day, hopefully we'll see all of these techniques trickle down to desktop/consumers! The Zettascale strategy is interesting because it pulls you into real world limitations, that is physics, that will inevitable halt performance if we don't invest in new techniques. Like with 3D V-Cache, although is a great solution for more L3$, there are still thermal limits. AMD investing in RnD is a long term goal. And investing and brute forcing into todays technologies like monolithic designs, we'll see in the near future to be unreasonable.
With X3D CPUs and Navi31 AMD is really aggressive in using their top technology for consumer products, tho as you said, its always "trickle down". Zen 2 chiplets where designed for servers, not desktop, same with 3D V-Cache. But they work great for desktop too.
1) LightMatter wafer scale optical interconnect 2) Ultraram replacing most chiplet cache, HBM, DRAM, and NVM 3) Accelerators on chip/package 4) Combining CPU, GPU, FPGA on package 5) Backside power delivery 6) VTFET 7) Deep trench capacitor on wafer with direct 3D bonding integration 8) Glass based motherboards with integrated photonics, power deliver, and microfluidic cooling
To me, the most memorable part of the keynote in the entire Zettascale race was logic on memory. I can't really imagine just how much you could realistically put on RAM, probably only basic math operations as anything too complex would probably be too costly. But if you can even just do basic math, even just add/substract/jump, it'd be a true revolution. So many basic operations would be loaded off the CPU and live in the RAM. The CPU would just have to send the request and that would seriously take down transfers. You could go from 20 transfers and operations down to something like one CPU -> RAM transfer, operations by the ram, and then RAM -> CPU transfer when done to send back the updated data the CPU wanted. It's truly revolutionary in speed and efficiency. How costly/plausible...don't know. But I find it to be the most impressive thing.
Such a great channel & amazing video explanation. Even big youtubers like linus tech tips don't explain chip design like this. Very underrated channel.
Interconnects are the key to new age of 3D stack chips I think. We will get to a point where the processor is not 2d but more like a solid cube. Inside this solid cube is all semiconductor.
New sub. Thank you for highlighting the tech along with the exciting and somewhat scary aspects of modern computing. This vid is a good indicator of how fast the computing field has moved in recent years. I expect that the AI at work currently is being utilised to create the next generation of SOC and AI. Something of a self fulfilling prophecy, thus the tech questions to achieve zetascale are likely to be answered in the next 15 years, however the bigger question is the impact it has. A discussion that has yet to really hit the larger population.
@@HighYield 110% agree with u that MI300 is a prototype of the chip we will see in the future also in the consumer market with lot of 3D stacked L3/4 V-Cache and possibly even with some of the HBM memory. I would say next gen consoles, maybe even PS5 Pro will utilize V-Cache because of its huge benefit for gaming. In the enterprise segment similar chips to MI300 will also expand to next level not just on the points u have mentioned (especially 3D stacking & packaging) but mostly with - chip/chiplet customization. Companies will be able to fully customize (for an extra price of course) what kind of chiplets/accelerators they'll get in their chips . It won't be surprising to see future MIxxx AMD chips with their Xilinx FPGA and for example Tenstorrent chiplet.
Excellent video, really enjoyed it. I'm not an expert by any means but I do think once we are able to produce graphene transistors at scale that both speed and efficiency is going to make a gigantic leap forward, combine that with optical data transfer off chip the leap forward will be incredible.
And with chiplets design, AMD can scale their products way easier than competitor. Shown last week, MI300 has 2 variants, "A" with 6 GCDs and 3 CCDs, and "X" with basically all CCDs replaced entirely with GCDs, making it GPU only. This modularity is going to please any kinds of customers.
The eight extra "dies" are probably what I could call fillers, not spacers. I don't think they are there to provide any structural purpose, but simply to take up space that would otherwise have to be taken up by the resin used to make the whole package flat. As for the stacking method, I can only make a wild guess. The bottom dies are apparently mostly for I/O, so they could be quite large, given the lack of scaling with I/O. They would then also have a lot of dead space, making TSV's an easy option on that side of the equation. If I had to guess, I'd say they connect to the upper chiplets using the normal connection points those chiplets would have if they sat directly on a package. Meaning the bottom dies replace the substrate as far as the stacked dies are concerned, and use TSV's as needed to connect to the actual substrate for reaching the socket pins. That would make the overall design chiplets on top of active interposers, on top of a passive interposer with HBM at the sides.
Wrong. They are officially & explicitly according to both AMD & TSMC there to maintain structural integrity across the massive chip package, just like OP claimed. If it was just a massive valley of pure in-fill material placed over those areas in-between the HBM3 stacks instead of hard silicon, it would have allowed SIGNIFICANTLY more flex and warping under the package's full 750W power/thermal load, which could EASILY break the RIDICULOUSLY FUCKING FRAGILE like 1100mm² active silicon interposer everything sits on top of. 🤷 (Not break as in "crack it" or anything, but rather cause some of the countless MICROSCOPIC TSV's ["Through-Silicon Vias"] connecting the massive interposer to all of the various chips on top of it to become disconnected.)
And the bottom dies sit on top of a massive package size active silicon interposer, NOT on the package substrate itself! The gargantuan silicon interposer underneath all the active chips above is what's connected to the actual package substrate.
Great video about the technology. AMD has certainly embraced chiplets well. The acquisition of Xlinix earlier on by AMD ensures that they have the best substrate and packaging technology as Xlinix is considered the best in this area. AMD can make custom SOCs to target high end AI companies. Exciting times ahead for this technology.
I noticed that I started to anticipate your videos! When I have free time it's the first thing that I look for. There are lots of technology channels, but lots have fillers and content oriented to entertain, which isn't bad, but I am a huge nerd and I enjoy more this channel. Keep going!
Not stopping anytime soon, but I've had this on-and-off cough for almost a month now, which makes recording videos harder. I will try to get back to my original "once a week" schedule, or at least not keep the current "once every 2-3 weeks" :X
@@HighYield Don't worry! I view your videos for free, so I cannot complain, and I'd rather have a few and good quality than stream of conciousness videos.
You can get an IBM x3650 M4 with Intel Xeon E5-2670 2.6ghz base frequency and set it to run at 3.3ghz on high power mode. There are two silicone chips with 8 processors in each chip and 2 threads per cpu. The unit also comes with 64gb memory at 1333mhz for 150.00 bucks. I then added a gtx 1660 super with 6gb ddr6. and 128gb additional ram. I love it. I put Ubuntu 22.04lts on it for the operating system. I picked up 10 900gb sas drives and put 8 in the machine to start and set them up with raid 5. There are a ton of these servers on ebay and can be obtained in a server or desktop server package. Thought I would share. Good luck. Have fun.
weeks ago i saw an stupid comment.. AMD is just all about brute force thats useless.. " I saw that comment and i couldnt stop laughthing" hell yes efficent brute force, best combination ever and that random guy was just upset about it :D. It seems like amd has some insanely talented engeniers, its great to see it , if i am not mistaken Apple thought about chiplets many years ago for pretty long time and they failed. Make this architecture real thing was realy an exceptional and kinda scary challenge.
The transition to GAAFET/MBCFET in near future process nodes like 18A strongly indicates that process will still prove to be a driving factor in performance. Despite TSMC still using FINFET for 3nm, GAA may be feasible in sub-nanometer
With memory integrated onto the chip, the memory timing will probably be REALLY good! Plus, with a gpu with performance levels of dedicated gpus on the same die, that's gotta have a huge performance increase. If nobody posts a TH-cam video of gaming performance with fps graphs with this thing, I will be VERY disappointed. Maybe Linus will do that.
While CDNA3 isn't really a gaming architecture like RDNA3, I'm sure you can run games on it. If you get the drivers working MI300 would be a power house!
GPU's are the few things I would rather not have directly integrated onto the CPU. For the express purpose of it destroying the modularity modern systems have. I like the idea of having memmory integrated onto the GPU die. I think its much wiser to just offload more and more things onto the GPU, than to just integrate the CPU and the GPU closer. For gaming especially, the CPU shouldnt be doing shit when it comes to rendering. It should mostly be hardware doing the job.
You said in the video that you think in the future semiconductor a packaging will be more important than process node. can you expand on that? or maybe make that its own video?
The basic idea is that with process node progress slowing down, how you package/combine chips becomes much more important. I'm sure it will be a topic in future videos :)
The genius part is making a sort of universal chiplets, that can actually be used in more than one purpose. Whoever figured it out needs someone to go grab him a coffee.
There will even be a PCIe accelerator version that's basically 1/2 of a MI300X on a half size interposer & thus package! (2x base tiles + 4x GCD's & HBM3 stacks vs MI300X's 4x/8x, so 152 CDNA 3 CU's w/ 96GB HBM3 vs MI300X's 304 CU's/192GB). Think the new MI210 equivalent to MI300X's direct MI250X replacement.
Is the packaging option you mentioned where the base chiplets are smaller than the silicon on top so that there can be direct connection between the interposer and the silicon similar tp what Intel has presented some time ago as Foveros Omni?
Imagine a (professional) consumer grade APU with 24+ Compute Units of RDNA 3-4, with 128MB 3DvCache or, on die HBM. Filling a PCIE slot with other expensive silicon could become optional in some cases.
@HighYield Great channel. If I may, please pay attention to how long you keep a slide on the screen, especially when text is flashed onto the slide at the last moment. Just a little thing.
@@HighYieldSure: 2:19 9:39 2.5/3D stacking difference: diff is "both (MEM/compute) are actually active". At 9:47 irrelevant fine print comes into view for 2s. Even tho that text is unimportant, it creates the impression that I’m missing something. 9:50 "The 5nm MCD and CCD chiplets …". I couldn’t find MCD in chip diagram or "AMD MI300" text column at right. 11:54 slide phase-in effect made text unreadable for 4s. Slide was on for 10s, but disappeared once motion of text froze. Combined, only once text froze was I ready to read, so reading felt very rushed. 13:29 uses same phase-n effect. But is slide stays longer. 14:19 same 17:15 4s The slide phase-in effect where the start of all sentences are not visible at first and text keeps moving but disappears once it stops moving makes a slow reader like me feel rushed. (Secret: especially for someone like me who likes to listen at faster speeds, but reads slowly. I understand it’s not possible to cater to making readable slides at faster speeds). The section on 2.5D vs 3D at 9:39 created confusion because the explanation of "in 3D, both chips are active" made me investigate the diagram more, and I couldn’t find MCD, so I was trying to find that or determine if you actually said GCD. The main point is that the explanation of both chips being active in 3D seemed off: the difference strikes me as more about how intensive the transistor switching density (and heat) is for the chips stacked, how tolerant the circuitry is of heat (DRAM, SRAM are finicky), and heat dissipation for the stack. After all, in HBM, all die in the stack can have dram banks active at the same time. Please take my original comment as a bit nit picky. I really like your channel and appreciate the time you put into such thorough content research and video production. I’d like to see your channel develop larger viewership.
@@HighYield I am quite sensitive to repetitive noise, so I am probably not a good reference. I think if you change it up it should be good, as I did not feel it grow in my mind in the smaller videos. Maybe use the chapters to change it up or something like that? Still, the video was very worth it anyway =)
Great coverage and analysis, but could you perhaps remove the 'fizzy' distracting background music please? Or would it be possible to upload different version of the video that has no background music?
@@HighYield It would be the one used in this video in particular. It's the quick drum percussion synonymous with college/high school football bands, but also commonly used in modern hip hop. It tends to be effective in drawing one's attention to the beat, away from what you are trying to say. I hope that's clear enough. Cheers.
2:30 I think Unified Memory will really not be a thing. 32GB of RAM for the best-in-class DDR5 Hynix A-die is $80 at its cheapest. The top-of-the-line CPU and GPUs combine to thousands of dollars. Unified Memory involves a lot of management for divy'ing up RAM between the two, which is likely to be slower than just giving CPU and GPU its own separate RAM. Especially given just how cheap RAM is. For Data Centers, Unified Memory for training AI models is important since we're talking Terabytes of RAM. But atm even 16GB vs 32GB is showing virtually no difference to consumers.
UMA makes data access *faster* (lower latency) as well as more efficient. Managing caches and external memory access was a huge burden for traditional multi-processors.
Wonderful video! May I offer one correction? The units for die size are square millimetres, not millimetres squared: en.wikipedia.org/wiki/Square_metre
Thanks for pointing that out. Honestly never thought there would be difference. In German (my native language) it's also "square millimeters", idk why I'm saying the other way around when I talk in English.
@@HighYield because English is a terrible language! Many native English speakers who speak no other languages make the same mistake, saying "well, that's how it's written". By the way - your English is perfect!
When i proposed to Moore's Law is Dead back when their fist chiplet CPUS were announced(2019), that AMD would make a mega APU in a few years he laughed and called me stupid
To be fair to MLID it was a superchat, so couldnt clarify the timeline, nor the market, so he may have been thinking a threadripper APU in 2019 based on Zen2 and CDNA/RDNA1
I say APUs will be the next wave as they are able to perform equal to/better than a console for less money. Small form factor mid range Gaming PCs will be their replacement. I would like to see mobos using dual APUs and shared VRAM style RAM for system and graphics. I really like where they are going. It's what I was hoping for, so I think it will close about the console prediction. Maybe one more "next gen" system from MS and Sony then done....unless they start making pre-built under the name, but they prob won't wa t to deal with that if there is no licensing revenue. Nintendo will probably keep making consoles though, I hope. Anyway my same old ramblings. Thanks for the update!
Maybe we shouldn't enter the zettascale era. Maybe we should use what we already have and fix climate change (thank you people of the internetz for not responding to this comment if you disagree) Ps: great video.
It is not only that AMD has a better product than Nvidia but they are also going to use open source software which will be better and cheaper than Nvidia software. It is a win win for AMD. 🎉
The SoC design for HPC make a lot of sense! I can se us, normal people, buying a PC ou notebook and needeng to upgrade it's RAM because of greedy software. But HPC normally go right were it's needed and don't make a lot of upgrades. If need an upgrade for the whole system, normally change the platform. But ok, when theses chips will show up in aliexpressn in a sketchy motherboard with a low price? 2028? I can wait with my zen 3 HAHAHHAHA
CPUs do the serial computing, CDNA cores do the parallel. Together it's called heterogeneous computing. AMD has nice paper on heterogeneous computing, just Google AMD HSA paper, or something like that.
GPUs are accelerators only used to perform a specific set of operations. CPUs are still needed to feed the GPUs and run the actual programs. For instance if you're doing AI training. You still need the CPU to parse and provide the data to the GPU, and then compile those results into a model. CPUs are really good at executing serial code, while GPUs are good at executing highly parallel code. You need both. And AMD is the only company that can provide best of breed CPU and GPU.
I think PHOTONICS are THE FUTURE........ why continue pushing electrons around a board and chips where every bit of distance requires more power to get it there... its inefficiency, like you said, will require a Nuclear Power Plant just to run z Zettascale Supercomputer and the building it's housed in.... photons (light) are so much better to do the work with (IMO) although, we do not have a processor that can compute using photons instead of electrons (I think) .... but there has been some progress made in this field. BUT not very much. Not enough to make a difference right now. We still still use the traditional electronic methods with some photonic pieces in the system currently... we NEED to make a 100% photonic computer for it blow people's minds enough to fully pursue it instead of using a mix of both systems.... I'm hoping you've done a video on this already (I JUST came across your channel when this video was suggested to me, but this vid got me to sub... I really like the cut of your jib, sir! lol) if nopt then I hope you can do one in the near future...there is just too much potential with photonics for it to continue being ignored :)
Photonics are definitely a path for the future, but we are quiet a bit off in my opinion. No video on photonics yet, but it has been on my list for a while now!
Unless you mean to return to analog, application-specific optical Fourier transforms, I don't see why photons would be better to work with than electrons. Photons still have to interact with electrons to change their phase and direction, and the structures have to be much larger than a wavelength to overcome diffraction. Current wavelengths are in the tens or hundreds of nanometers, so the density would be lower.
2:00 "Apple has been a pioneer in this area..." No, no they have not. They licensed ARM just like thousands of companies before. If they had integrated RISC-V, maybe you would have a point. What about Chromebooks? Many of those run on SOCs and they have been around for many years. Is somebody who shows up at some point after all technologies exist a pioneer? Damn, I should do some pioneering.
The stacking of chips was started by Tesla and is used in there cars. This one technique increases processing power and reduces power consumption just by doing that alone. Distance matters.
I think AMD wants either take over or participate all main computer components (CPU. RAM, GPU) profits, it would be logical from business point of view, simple basic reason. 3D stacking technology is beyond my imagination, but explanation by expert of how it works can be easy. As amateur I can think if single layer of chip adds some mass over substrate, then making dead electricity conducting areas on ground transistor layer towards vertical space would enable ability to stacked another layer of transistors fuctioning either 2 5D or fully connected. Or rather some kind of placing another layer of substrate at single transistor accuracy in X and Y axis. The real way it is made needs explanation, if I haven't missed its concept in this already scientific publication of upcoming product.
For me, the easiest way to imagine 3D stacking is with the use of TSVs, which are literally tiny copper wires drilled into the silicon that connect both chips. It's like a network of pipes.
eh... i don't want socs. that will just make everything more expensive to upgrade and less modular and will remove competition and/or options from some markets (like ram)
Are Nvidia and Apple the last big bastions of huge monolithic chips? Both of those companies seem to have the same attitude to throw money on the problem and let the consumer pay for it. Nvidia and their maxing out chip size for each node. Apple M3 with 3 different design chips reported to cost 1 billion dollars to make. Why did Apple not release a 4-chip M ultra-ultra chip as was rumored for the Mac Pro replacement? Pro machines with 192gig memory are a no-go for many IT pros. (and 256 does not solve it either. 4 chips would at least bump it up to 512 compared to Intels MacPro with 1.5 terra)
Cerebras is banking on wafer-scale chips (entire wafer is one chip) so it can work to have big chips with the right design. You have to be able to fuse off all the bad silicon and/or have enough binning and market margin/demand to make it work. I read on the average nvidia gpu die 10-20% of the silicon is defective. The chiplet strategy is more economical, especially with a simple 2D interposer as on ryzen/epyc. The advanced packaging is much less economical, it would seem easier to do than EUV but if there is any flaw, you are throwing out a lot of chips (no fusing option at this stage). So it really depends on the packaging cost, capacity and reliability as far as how it shakes out in the end.
Do you think all five technologies will "trickle down" into the consumer market? Will we buy a single "gaming SoC" at some point in the future?
We're almost there... I have a miniPC with a 6900HX and I can play EVERY game at 720p... WHen StrixPoint comes out next year I think we'll be at 1080p60 using FSR3...
I haven't seen any benches with 7x40 APUs but they should be 50% faster than the 6900HX... Especially if they can get 30% higher clocks...
And then Strix should be 30-40% faster than that...
@@ChristianHowell Those system on a chip machines are really cool. I hope they make a bunch of improvements over the next 5 years.
@@ChristianHowell Those performance numbers are good for something like a portable system (Deck) or standalone VR headset
No, heat dissipation, leakage, and memory bandwidth, issues with latency. Just look at modern GPU's from nvidia and AMD way too costly.
Physics isn't magic.
probably zen6 - 2026. 8+8core + 24-32 wgp GPU chiplet, On bottom IO+128mb cache and 2 chanel DDR6 memory. All for $500-600
One correction I would mention is. The latest version of PyTorch. PyTorch 2.0 is moving away from CUDA. For ever increasing AI models CUDA on its own can't optimize for the scale. The frameworks themselves have to be optimization aware. This is why ML frameworks are shifting from the eager mode to the graph mode, which sidesteps CUDA (cuDNN) and provides better performance. Instead of using CUDA, they will use tools like Triton (this is what Open AI's ChatGPT uses) which interfaces directly with Nvidia's NVPTX and AMD's LLVM AMDGPU backends. So CUDA is on its way out and with it Nvidia's software moat. MI 300 will be a monster.
I'm not up to date with all the AI/ML software. Maybe this is AMDs chance to catch up to Nvidia?
@@HighYield I believe so. There is a big push at AMD to improve their AI offerings. mi250 was kind of a test, with limited matrix multiplication units, and built more for pure scientific HPC applications (like the ones used by Oak Ridge supercomputers). mi300 will focus on AI. And I do believe it's going to be very competitive with Nvidia.
@@SirMo Excellent. AMD has been lacking in that market, and some competition against Nvidia's monopoly is welcome. I know they've had their CUDA compatibility layer for ages, but a proper alternative is always best.
Another blind fanboy having the amd cool aid drink, amd has zero chance in AI because nvidia is light years ahead with full stack ecosystem and partnerships, all amd can do is make chips but that wont take them anywhere, its all about the ecosystem and the advantage of scale where nvidia absolutely doimantes, poor amd was only able to catch on intel because intell got fallen behind TSMC in process node technology and amd used TSMC but with nvidia there are no low hanging fruits to eat because unlike intel nvidia is not sitting on its laurels they are moving ahead with light speed in AI and here amd is just warming up.
You think so? I am learning CUDA and working on 12.1. I am just getting started.... Just want to do some small scale work to learn this new tech. There are a ton of processors on the market for cheap that I would have been thrilled to have just 5 years ago. Did you see CUDA has a new communication system that is not only under development but already being used that reduces i/o bottle neck. I hope AMD does something great though to keep the competition going. Anyway gotta run.
This maybe the first step of the war coming between AMD and Nvidia, I'm waiting for Intel to react, but the advances that AMD is making are huge.
They don't have enough supply to compete with nvidia no matter how competitive their architecture is.
@@beachslap7359 If Nvidia doesn't have chiplets with Blackwell, they're screwed. It's known they're working on it, but what they themselves don't know is if they'll get it done on time.
@@beachslap7359 AMD produces more chips than Nvidia. They have all the supply they need.
@@beachslap7359 AMD has the supply, they just don’t have the demand or market share. If there was proper demand, they would shift wafer capacity from their Ryzen and gaming gpus
@@coladict they have by far a more superior architecture right now compared to amd. Why would that change for the next generation just because they don't have chiplets? Especially considering the node jump is gonna be slim for both companies.
your videos are amazing and i learn so much new interesting information even though i don't understand everything its still rewarding to watch you explain and develop my own knowledge just as you did. thanks for that and greetings from hamburg, germany :)
100% agree 👍
Awesome job covering the mi300 it’s so impressive and beyond anything ever made no one knows how to cover it or even talk about it. The technology is going to make it into gaming consoles. I predict next gen is going to be so integrated the thought of adding ram and separate cards to a pc will feel ancient
1. Excellent video. I don't follow the data center innovations that closely, I'm more of a desktop gaming guy, so this video was absolutely fascinating to me. Well explained, well segmented. And it's exciting to think about what this will mean for the desktop for the upcoming decade.
2. Before you introduced the 5 new technologies, I paused the video and gave it a quick think of myself. I basically came up with the same categories. Except I came up with "heterogeneous design". In my head that was something that takes the SoC and disaggregates it into chiplets but also includes mixing process nodes and possibly chiplets made by different vendors / foundries. We're not quite there yet. But in my head mixing 5 with 6 nanometers is a part of it. So I basically mushed your "SoC" and "Packaging" category together and added a bit of my own flavor.
3. The classic 'von Neumann' architecture on PC can't keep up anymore. We see this with the consoles, how a smaller, much cheaper design can yield incredible performance. Mid to high-end PCs that cost 3 times more struggle to play the latest console game ports. This is ridiculous, somethings got to change. I'm curious how a next gen PC architecture will look like. Will we still have a modular design, how will cooling look like and will manufacturers be able to agree on a standard in time before consoles make the PC look even more boomer than it already looks to some people?
4. Exciting times ahead.
It doesn't really matter how you call it. Remember when AMD's company slogan was "the future is fusion"? It's kinda ironic that now that they achieved this complete "fusion", it's not their slogan anymore. But the technology has been a long time coming. Long before their Ryzen comeback, AMD was forced to innovate to stay alive. The design lead AMD currently has is a result of this.
Fully agree, exciting times ahead!
I think you are spot on.
Though I would say that another tech to look out for is in-chip fluid cooling. Heat is a huge problem, especially with 3D stacking. Efficient extraction of heat allows for higher frequencies and lower energy use (as heat increases resistance).
Next-gen cooling tech is definitely worth a future video :)
Incidentally when AMD was buying ATI for Radeon the "Fusion" idea of not just seamless GPU fp compute but a unified address space was used as justification.
It's over a decade but at last this becomes more feasible.
Not just MI300 but SAM and recent DX12 extensions are aimed at shared address space.
From "the future is fusion" to the actual "fusion future". AMD's slogan back then was more than just marketing, it was a promise.
Exactly, & the sale formalized in 2005 - almost a generation ago.
@@peterconnell2496 the bottom line was the APU wasn't that feasible, but now we see Intel & AMD strengthening gfx, while Nvidia are launching ARM CPUs for the data centre.
So this memory unification is the direction for high performance.
The old APUs were too compromised by cost limits, restricted memory bus, cache; despite AMD having a transparent NUMA in their server CPUs, they didn't have the investment funds to realize the possibilities.
A sceptical observer suspects the justifications were rationalisation for a panic reaction to CUDA.
IIRC ATI were in trouble, AMD wanted a slice of GPUGP, so the Fusion concept of another VLSI step was born.
The problem was Intel had successfully responded to AMD64 and the x2 chips with Core Duo, had bribed the key OEMs and had a voucher system of rebates with Intel Inside that small dealers relied on.
So AMD were squeezed from both sides, not able to realise the profit from their real innovations and NOT having the financial muscle to buy an OpenCL counter to CUDA of sufficient quality & application support.
AMD were following and trying to catch up; Intel had gone awry with P4/Itannic but commercial power kept them strong.
Nvidia meanwhile reaped the rewards of the collapse of competition with their new main competitor having to divert funds away from future GPU designs.
Something I wanted to add: since we don't know the packaging method yet, when I talk about the "interposer" it doesn't have to be a large silicon interposer, it might l be a small "organic interposer" like on Navi31, using TSMCs "Fan-Out" technology. Once we know more another video will follow!
www.tsmc.com/english/dedicatedFoundry/technology/InFO
IIRC, TSMC is using CoWoS and TSVs with the caches like X3D... They renamed it to 3D Fabric for the whole infrastructure... AMD is the cutting edge of silicon right now... They had to sell their own fabs and came out even better in the long run... All their 5nm products haven't even come out yet and the 4/3nm Zen5 ones will have even more instrs and accelerators...
@@ChristianHowell AFAIK, SoIC is also a Chip-on-Wafer technology.
@@HighYield Yeah, they folded it all into 3D Fabric terminology... But the basis is CoWoS because it lets them package huge chips like MI300...
It's 100% confirmed to be a standard, absolutely freaking MASSIVE (as in it pushes TSMC's max reticle size limits right up to their absolute freaking er.... limit lol 🤣) silicon interposer. 🤷
AMD were originally planning on doing what MI250X did to connect its HBM, which is use a hybrid of a standard interposer ala MI300 & fan-out wiring ala Navi 31, w/ TSMC's "CoWoS-R" packaging, which uses a small silicon bridge + fan-out technology. (Think Intel's EMIB, but instead of the tiny silicon mini-interposer bridges being directly connected on both ends w/ physically die-to-die attached TSV's [which TSMC can now do too, called "CoWos-L"], they're connected w/ less dense fan-out wiring [although still plenty dense enough for HBM3].)
But then at the absolute very last minute possible, AMD had to switch to a traditional absolutely gargantuan silicon interposer (aka using TSMC's "CoWoS-S" packaging) pretty much as late as they possibly could due to reliability concerns, thanks to the massive package flexing/warping when kept at its full-fat 750W load for seriously extended periods.
Basically, with MI300 having a MUCH larger overall package than even MI250X, the tiny silicon bridges connecting the HBM stacks were simply much more prone to failure than a single massive contiguous silicon interposer across the entire package under the kinds of "100% load, 24/7/365" conditions that are inherently endemically common in HPC/data center land. 🤷
@@ChristianHowellIt's CoWoS, sure, but TSMC has like 10x COMPLETELY DIFFERENT technologies under that single banner, making it a mostly meaningless/useless label just by itself outside letting you know the product is multi-chip. 🤷 MI300 uses a classic single massive (literally reticle limit pushing) contiguous silicon interposer base layer (just like in say Vega), which in TSMC's marketing speak is called their "CoWoS-S" ("CoWoS on silicon") packaging technology.
For an example, CoWoS-S/a traditional large silicon interposer is an ENTIRELY DIFFERENT packaging tech than what's in MI300's also CoWoS but rather "CoWoS-R" using MI250X predecessor!
("CoWoS-R" is a hybrid between Intel's EMIB tiny silicon interposer bridges [which TSMC also now has a proper version of, called "CoWoS-L"] & the ultra dense fan-out wiring tech used by Navi 31/2 & Apple's M# Ultra SKU's. Basically think EMIB style silicon bridges but connected on both sides w/ less dense but still plenty fast for HBM3 fan-out wiring vs EMIB's/CoWoS-L's direct chip-to-chip TSV's.
They didn't end up using CoWoS-R again on MI300 like they'd originally planned to because of reliability issues w/ the tiny silicon bridges at its full 750W load for truly extended periods, as is the standard HPC/data center usage environment.)
MI300 is something I've been waiting for since I saw the initial HPC chiplet APU patent... The interesting thing is that older CPUs used to remove functions from the CPU die because of limited transistors at 180nm etc...
But the biggest thing about it is that I heard some autonomous driving folks say that 2000 TOPs are needed for an FSD experience and since MI250X has 383 TOPS, 8X that is almost 3500 TOPS... AMD can now theoretically provide all the chips for automobiles out of nowhere it seems (NOT!!!)... They can use an edge like appliance with a Pensando front end for network relays for traffic and weather, etc. for a LARGE MAP area, while an upcoming Ryzen APU can do the entire system, including 4K video and gaming... Companies are selling mini PCs with Ryzen and Radeon 6000 that can do 4K30+... Zen4 telco servers can do edge processing while EPYC can stream games and all types of data including AI for predictive routing...
AMD is way ahead of the curve vs the competition. They just need someone to market the tech better. They are a true heterogenious system and get better and better every year. Now AMD is sharing GDDR with CPU / GPU and other AI accelerators.
The packaging/chiplet design is quite brilliant (speaking of, gratz on the sponsorship)! One day, hopefully we'll see all of these techniques trickle down to desktop/consumers! The Zettascale strategy is interesting because it pulls you into real world limitations, that is physics, that will inevitable halt performance if we don't invest in new techniques. Like with 3D V-Cache, although is a great solution for more L3$, there are still thermal limits. AMD investing in RnD is a long term goal. And investing and brute forcing into todays technologies like monolithic designs, we'll see in the near future to be unreasonable.
With X3D CPUs and Navi31 AMD is really aggressive in using their top technology for consumer products, tho as you said, its always "trickle down". Zen 2 chiplets where designed for servers, not desktop, same with 3D V-Cache. But they work great for desktop too.
1) LightMatter wafer scale optical interconnect
2) Ultraram replacing most chiplet cache, HBM, DRAM, and NVM
3) Accelerators on chip/package
4) Combining CPU, GPU, FPGA on package
5) Backside power delivery
6) VTFET
7) Deep trench capacitor on wafer with direct 3D bonding integration
8) Glass based motherboards with integrated photonics, power deliver, and microfluidic cooling
Fugaku with the Fujitsu A64FX walked so El Capitan and Mi300 could run. Seriously.
Great video, cheers!
I hope all manufacturers start stacking chips and putting them in our desktops. Gonna get some cool stuff soon.
To me, the most memorable part of the keynote in the entire Zettascale race was logic on memory.
I can't really imagine just how much you could realistically put on RAM, probably only basic math operations as anything too complex would probably be too costly.
But if you can even just do basic math, even just add/substract/jump, it'd be a true revolution. So many basic operations would be loaded off the CPU and live in the RAM. The CPU would just have to send the request and that would seriously take down transfers. You could go from 20 transfers and operations down to something like one CPU -> RAM transfer, operations by the ram, and then RAM -> CPU transfer when done to send back the updated data the CPU wanted.
It's truly revolutionary in speed and efficiency. How costly/plausible...don't know. But I find it to be the most impressive thing.
maybe even on nvme storage?
Such a great channel & amazing video explanation. Even big youtubers like linus tech tips don't explain chip design like this. Very underrated channel.
I felt this is best ad transition ever. It kinda convinced me to learn some on brilliant 😂
I get lots of sponsorship offers, but I only take the ones I actually think are really useful and Brilliant is definitely useful.
Interconnects are the key to new age of 3D stack chips I think. We will get to a point where the processor is not 2d but more like a solid cube. Inside this solid cube is all semiconductor.
And if you come up with a good way of cooling this cube, you could be the next Bill Gates!
I want solid cube of gan
New sub. Thank you for highlighting the tech along with the exciting and somewhat scary aspects of modern computing. This vid is a good indicator of how fast the computing field has moved in recent years. I expect that the AI at work currently is being utilised to create the next generation of SOC and AI. Something of a self fulfilling prophecy, thus the tech questions to achieve zetascale are likely to be answered in the next 15 years, however the bigger question is the impact it has. A discussion that has yet to really hit the larger population.
Excellent vid, thank u for making it!
Thanks for watching :)
@@HighYield 110% agree with u that MI300 is a prototype of the chip we will see in the future also in the consumer market with lot of 3D stacked L3/4 V-Cache and possibly even with some of the HBM memory. I would say next gen consoles, maybe even PS5 Pro will utilize V-Cache because of its huge benefit for gaming.
In the enterprise segment similar chips to MI300 will also expand to next level not just on the points u have mentioned (especially 3D stacking & packaging) but mostly with - chip/chiplet customization. Companies will be able to fully customize (for an extra price of course) what kind of chiplets/accelerators they'll get in their chips . It won't be surprising to see future MIxxx AMD chips with their Xilinx FPGA and for example Tenstorrent chiplet.
The most fun part there is the 3d stacked L3 CPU cache. A-freaking-some!
Excellent video, really enjoyed it. I'm not an expert by any means but I do think once we are able to produce graphene transistors at scale that both speed and efficiency is going to make a gigantic leap forward, combine that with optical data transfer off chip the leap forward will be incredible.
And with chiplets design, AMD can scale their products way easier than competitor. Shown last week, MI300 has 2 variants, "A" with 6 GCDs and 3 CCDs, and "X" with basically all CCDs replaced entirely with GCDs, making it GPU only. This modularity is going to please any kinds of customers.
The eight extra "dies" are probably what I could call fillers, not spacers. I don't think they are there to provide any structural purpose, but simply to take up space that would otherwise have to be taken up by the resin used to make the whole package flat.
As for the stacking method, I can only make a wild guess. The bottom dies are apparently mostly for I/O, so they could be quite large, given the lack of scaling with I/O. They would then also have a lot of dead space, making TSV's an easy option on that side of the equation. If I had to guess, I'd say they connect to the upper chiplets using the normal connection points those chiplets would have if they sat directly on a package. Meaning the bottom dies replace the substrate as far as the stacked dies are concerned, and use TSV's as needed to connect to the actual substrate for reaching the socket pins. That would make the overall design chiplets on top of active interposers, on top of a passive interposer with HBM at the sides.
Wrong. They are officially & explicitly according to both AMD & TSMC there to maintain structural integrity across the massive chip package, just like OP claimed.
If it was just a massive valley of pure in-fill material placed over those areas in-between the HBM3 stacks instead of hard silicon, it would have allowed SIGNIFICANTLY more flex and warping under the package's full 750W power/thermal load, which could EASILY break the RIDICULOUSLY FUCKING FRAGILE like 1100mm² active silicon interposer everything sits on top of. 🤷
(Not break as in "crack it" or anything, but rather cause some of the countless MICROSCOPIC TSV's ["Through-Silicon Vias"] connecting the massive interposer to all of the various chips on top of it to become disconnected.)
And the bottom dies sit on top of a massive package size active silicon interposer, NOT on the package substrate itself! The gargantuan silicon interposer underneath all the active chips above is what's connected to the actual package substrate.
Great video about the technology. AMD has certainly embraced chiplets well. The acquisition of Xlinix earlier on by AMD ensures that they have the best substrate and packaging technology as Xlinix is considered the best in this area.
AMD can make custom SOCs to target high end AI companies. Exciting times ahead for this technology.
I noticed that I started to anticipate your videos! When I have free time it's the first thing that I look for.
There are lots of technology channels, but lots have fillers and content oriented to entertain, which isn't bad, but I am a huge nerd and I enjoy more this channel.
Keep going!
Not stopping anytime soon, but I've had this on-and-off cough for almost a month now, which makes recording videos harder. I will try to get back to my original "once a week" schedule, or at least not keep the current "once every 2-3 weeks" :X
@@HighYield Don't worry! I view your videos for free, so I cannot complain, and I'd rather have a few and good quality than stream of conciousness videos.
Very good video. Great analysis and insights on what AMD is doing and why.
Thank you, good to hear you liked it!
MI300? More like "Magnificent technology, and they're going way ahead!" 👍
You can get an IBM x3650 M4 with Intel Xeon E5-2670 2.6ghz base frequency and set it to run at 3.3ghz on high power mode. There are two silicone chips with 8 processors in each chip and 2 threads per cpu. The unit also comes with 64gb memory at 1333mhz for 150.00 bucks. I then added a gtx 1660 super with 6gb ddr6. and 128gb additional ram. I love it. I put Ubuntu 22.04lts on it for the operating system. I picked up 10 900gb sas drives and put 8 in the machine to start and set them up with raid 5. There are a ton of these servers on ebay and can be obtained in a server or desktop server package. Thought I would share. Good luck. Have fun.
and honestly i dont think any one can beat amd in raw power at this point and big tech corps are seeing that now
weeks ago i saw an stupid comment.. AMD is just all about brute force thats useless.. " I saw that comment and i couldnt stop laughthing" hell yes efficent brute force, best combination ever and that random guy was just upset about it :D. It seems like amd has some insanely talented engeniers, its great to see it , if i am not mistaken Apple thought about chiplets many years ago for pretty long time and they failed. Make this architecture real thing was realy an exceptional and kinda scary challenge.
The transition to GAAFET/MBCFET in near future process nodes like 18A strongly indicates that process will still prove to be a driving factor in performance. Despite TSMC still using FINFET for 3nm, GAA may be feasible in sub-nanometer
This is so cool. I want this for my home lab.
With memory integrated onto the chip, the memory timing will probably be REALLY good! Plus, with a gpu with performance levels of dedicated gpus on the same die, that's gotta have a huge performance increase. If nobody posts a TH-cam video of gaming performance with fps graphs with this thing, I will be VERY disappointed. Maybe Linus will do that.
While CDNA3 isn't really a gaming architecture like RDNA3, I'm sure you can run games on it. If you get the drivers working MI300 would be a power house!
GPU's are the few things I would rather not have directly integrated onto the CPU. For the express purpose of it destroying the modularity modern systems have.
I like the idea of having memmory integrated onto the GPU die.
I think its much wiser to just offload more and more things onto the GPU, than to just integrate the CPU and the GPU closer.
For gaming especially, the CPU shouldnt be doing shit when it comes to rendering. It should mostly be hardware doing the job.
@@honkhonk8009sadly the industry are move into unified system
You said in the video that you think in the future semiconductor a packaging will be more important than process node. can you expand on that? or maybe make that its own video?
The basic idea is that with process node progress slowing down, how you package/combine chips becomes much more important. I'm sure it will be a topic in future videos :)
You are amazing, thank you 😊
3:06 so many artifacts, must be algorithm
This gets me exited for RDNA4.
Love this YT channel . Excellent info -so educational 👍
Thank you! But its always important to realize I'm not 100% correct on everything, especially when some details are still unknown.
@@HighYield Still - we learn so much about the technologies you talk about. TY 👍
Very informative videos, please keep up the awesome content =]
As long as ppl are watching I won't stop!
Thank you for video! I think changing chip materials from silicon to carbon (or somthing else) is additional way to improve efficiency.
Do you mean carbon nanotubes or graphene when talking about carbon?
@@kingkrrrraaaaaaaaaaaaaaaaa4527 Yes, maybe, probably diamond structure instead of silicium.
Yea absolutely great presentation!
The genius part is making a sort of universal chiplets, that can actually be used in more than one purpose. Whoever figured it out needs someone to go grab him a coffee.
Intel's Meteor Lake will also be chiplet based (Intel calls them "tiles") and I'm sure sooner than later Apple & Nvidia will join the club.
There will even be a PCIe accelerator version that's basically 1/2 of a MI300X on a half size interposer & thus package! (2x base tiles + 4x GCD's & HBM3 stacks vs MI300X's 4x/8x, so 152 CDNA 3 CU's w/ 96GB HBM3 vs MI300X's 304 CU's/192GB). Think the new MI210 equivalent to MI300X's direct MI250X replacement.
Thank you, for doing What you do. New fan here, really appreciate the content, and your presentation.
Is the packaging option you mentioned where the base chiplets are smaller than the silicon on top so that there can be direct connection between the interposer and the silicon similar tp what Intel has presented some time ago as Foveros Omni?
Great video and well explained. Thks
Imagine a (professional) consumer grade APU with 24+ Compute Units of RDNA 3-4, with 128MB 3DvCache or, on die HBM. Filling a PCIE slot with other expensive silicon could become optional in some cases.
Within these 5 years AMD has never ceased to amaze the market with better products.
Fantastic summary!
@HighYield Great channel. If I may, please pay attention to how long you keep a slide on the screen, especially when text is flashed onto the slide at the last moment. Just a little thing.
Which slide are you referring to? A timestamp would help :)
@@HighYieldSure:
2:19
9:39 2.5/3D stacking difference: diff is "both (MEM/compute) are actually active". At 9:47 irrelevant fine print comes into view for 2s. Even tho that text is unimportant, it creates the impression that I’m missing something.
9:50 "The 5nm MCD and CCD chiplets …". I couldn’t find MCD in chip diagram or "AMD MI300" text column at right.
11:54 slide phase-in effect made text unreadable for 4s. Slide was on for 10s, but disappeared once motion of text froze. Combined, only once text froze was I ready to read, so reading felt very rushed.
13:29 uses same phase-n effect. But is slide stays longer.
14:19 same
17:15 4s
The slide phase-in effect where the start of all sentences are not visible at first and text keeps moving but disappears once it stops moving makes a slow reader like me feel rushed. (Secret: especially for someone like me who likes to listen at faster speeds, but reads slowly. I understand it’s not possible to cater to making readable slides at faster speeds).
The section on 2.5D vs 3D at 9:39 created confusion because the explanation of "in 3D, both chips are active" made me investigate the diagram more, and I couldn’t find MCD, so I was trying to find that or determine if you actually said GCD. The main point is that the explanation of both chips being active in 3D seemed off: the difference strikes me as more about how intensive the transistor switching density (and heat) is for the chips stacked, how tolerant the circuitry is of heat (DRAM, SRAM are finicky), and heat dissipation for the stack. After all, in HBM, all die in the stack can have dram banks active at the same time.
Please take my original comment as a bit nit picky. I really like your channel and appreciate the time you put into such thorough content research and video production. I’d like to see your channel develop larger viewership.
Thanks for the interesting video
Great video as always, I could do without the repetitive background music, though ;P
HYou mean no bg music at all or just change it up more? I feel like without bg music its too "empty".
@@HighYield I am quite sensitive to repetitive noise, so I am probably not a good reference. I think if you change it up it should be good, as I did not feel it grow in my mind in the smaller videos. Maybe use the chapters to change it up or something like that? Still, the video was very worth it anyway =)
What is the 128GB HBM for ? Is it the RAM ? Can customer upgrade for additional RAM?
Yes, that's the RAM. And it's not upgradeable, as it's HBM that sits on the package itself.
Hbm still more expensive than GDDR?
Will it return to gaming gpus someday?
I think at some point HBM might have it's come back, but not anytime soon. GDDR7 will be next.
MCR DIMM is the future. Latest pursuit in memory tech, from what I’ve recently read.
Nice video 👍 Very interesting.
AMD should no longer acknowledge Intel...as a kick in the face.. Intel deserves it..
Meteor Lake will be Intels make-or-break moment. If they can execute their version of a chiplet architecture well, they are back in the game.
5:50 I would not agree, spacers would not be split in half, double trouble to place and keep height in check
I want to have this monster in my desktop computer...
I really enjoy your videos. Please more : )
can you imagine if mi300 variants replace the current threadrippers?
Great coverage and analysis, but could you perhaps remove the 'fizzy' distracting background music please? Or would it be possible to upload different version of the video that has no background music?
Is it bg music in general or you just dont like the one I used for this video specifically?
@@HighYield It would be the one used in this video in particular. It's the quick drum percussion synonymous with college/high school football bands, but also commonly used in modern hip hop. It tends to be effective in drawing one's attention to the beat, away from what you are trying to say. I hope that's clear enough. Cheers.
great engineering is usually simple
Thank you
Thank you for watching!
Cant wait to equip sandivista prototype mk 1.
Just wait another 54 years ;)
2:30 I think Unified Memory will really not be a thing. 32GB of RAM for the best-in-class DDR5 Hynix A-die is $80 at its cheapest. The top-of-the-line CPU and GPUs combine to thousands of dollars. Unified Memory involves a lot of management for divy'ing up RAM between the two, which is likely to be slower than just giving CPU and GPU its own separate RAM. Especially given just how cheap RAM is.
For Data Centers, Unified Memory for training AI models is important since we're talking Terabytes of RAM. But atm even 16GB vs 32GB is showing virtually no difference to consumers.
Intel's PVC GPUs have 16 compute tiles sitting on top of a base layer that has sram cache.
UMA makes data access *faster* (lower latency) as well as more efficient. Managing caches and external memory access was a huge burden for traditional multi-processors.
Wonderful video! May I offer one correction?
The units for die size are square millimetres, not millimetres squared: en.wikipedia.org/wiki/Square_metre
Thanks for pointing that out. Honestly never thought there would be difference. In German (my native language) it's also "square millimeters", idk why I'm saying the other way around when I talk in English.
@@HighYield because English is a terrible language!
Many native English speakers who speak no other languages make the same mistake, saying "well, that's how it's written".
By the way - your English is perfect!
When i proposed to Moore's Law is Dead back when their fist chiplet CPUS were announced(2019), that AMD would make a mega APU in a few years he laughed and called me stupid
To be fair to MLID it was a superchat, so couldnt clarify the timeline, nor the market, so he may have been thinking a threadripper APU in 2019 based on Zen2 and CDNA/RDNA1
I say APUs will be the next wave as they are able to perform equal to/better than a console for less money. Small form factor mid range Gaming PCs will be their replacement. I would like to see mobos using dual APUs and shared VRAM style RAM for system and graphics. I really like where they are going. It's what I was hoping for, so I think it will close about the console prediction. Maybe one more "next gen" system from MS and Sony then done....unless they start making pre-built under the name, but they prob won't wa t to deal with that if there is no licensing revenue. Nintendo will probably keep making consoles though, I hope. Anyway my same old ramblings. Thanks for the update!
Maybe we shouldn't enter the zettascale era. Maybe we should use what we already have and fix climate change (thank you people of the internetz for not responding to this comment if you disagree)
Ps: great video.
It is not only that AMD has a better product than Nvidia but they are also going to use open source software which will be better and cheaper than Nvidia software. It is a win win for AMD. 🎉
Thanks!
No, thank you!
140B transistors 😮😮😮😮😮
But can it run Crysis?
Nah, its How Many?
The SoC design for HPC make a lot of sense! I can se us, normal people, buying a PC ou notebook and needeng to upgrade it's RAM because of greedy software. But HPC normally go right were it's needed and don't make a lot of upgrades. If need an upgrade for the whole system, normally change the platform.
But ok, when theses chips will show up in aliexpressn in a sketchy motherboard with a low price? 2028? I can wait with my zen 3 HAHAHHAHA
Wow, I don't even know yet how to use my Raspberry Pi 4 to its fullest...
When will AMD sell these for the gaming PC market or is this the next Play Station?
Thanos comparison was 😂😂😂
AMD BEST
Will MI300 be able to access off-chip regular memory? Say 1TB DDR5 type.
Thats a good question. 128GB HBM3 is a lot, but modern severs have TBs of RAM per CPU. Could be possible.
isn't SOC kind of counter to chiplets?
Soc are pretty much just general name for chip that have more than one component.
what's the purpose of the cpu cores here?
So it can function as an APU and you dont spend energy transferring data between CPU and GPU over a motherboard. It's for maximum efficiency.
CPUs do the serial computing, CDNA cores do the parallel. Together it's called heterogeneous computing.
AMD has nice paper on heterogeneous computing, just Google AMD HSA paper, or something like that.
GPUs are accelerators only used to perform a specific set of operations. CPUs are still needed to feed the GPUs and run the actual programs. For instance if you're doing AI training. You still need the CPU to parse and provide the data to the GPU, and then compile those results into a model.
CPUs are really good at executing serial code, while GPUs are good at executing highly parallel code. You need both. And AMD is the only company that can provide best of breed CPU and GPU.
They are using atomera's mst
can't wait to find out if this thing runs doom
Now bring it into next gen consoles, call them "pro max" and charge 1.5k$. i will happily buy them :d
I think MI300 will be more like $10,000+, but it would be a amazing console for sure :D
i do think the music is too loud
nice laptop chip. 👍🏿
With a 5 min battery life :D
@@HighYield Gaming laptops doesnt last much more.
I think PHOTONICS are THE FUTURE........ why continue pushing electrons around a board and chips where every bit of distance requires more power to get it there... its inefficiency, like you said, will require a Nuclear Power Plant just to run z Zettascale Supercomputer and the building it's housed in.... photons (light) are so much better to do the work with (IMO) although, we do not have a processor that can compute using photons instead of electrons (I think) .... but there has been some progress made in this field. BUT not very much. Not enough to make a difference right now. We still still use the traditional electronic methods with some photonic pieces in the system currently... we NEED to make a 100% photonic computer for it blow people's minds enough to fully pursue it instead of using a mix of both systems....
I'm hoping you've done a video on this already (I JUST came across your channel when this video was suggested to me, but this vid got me to sub... I really like the cut of your jib, sir! lol) if nopt then I hope you can do one in the near future...there is just too much potential with photonics for it to continue being ignored :)
Photonics are definitely a path for the future, but we are quiet a bit off in my opinion. No video on photonics yet, but it has been on my list for a while now!
Unless you mean to return to analog, application-specific optical Fourier transforms, I don't see why photons would be better to work with than electrons. Photons still have to interact with electrons to change their phase and direction, and the structures have to be much larger than a wavelength to overcome diffraction. Current wavelengths are in the tens or hundreds of nanometers, so the density would be lower.
I crave the FLOPS
Hope linus manages to make a video on it, I want to see it run games.
I want one but I'll never be able to afford it
2:00
"Apple has been a pioneer in this area..."
No, no they have not. They licensed ARM just like thousands of companies before. If they had integrated RISC-V, maybe you would have a point.
What about Chromebooks? Many of those run on SOCs and they have been around for many years.
Is somebody who shows up at some point after all technologies exist a pioneer? Damn, I should do some pioneering.
Will Nvidia produce CPU?
The stacking of chips was started by Tesla and is used in there cars. This one technique increases processing power and reduces power consumption just by doing that alone. Distance matters.
#6 different ai chips for different porpuses
I think AMD wants either take over or participate all main computer components (CPU. RAM, GPU) profits, it would be logical from business point of view, simple basic reason.
3D stacking technology is beyond my imagination, but explanation by expert of how it works can be easy.
As amateur I can think if single layer of chip adds some mass over substrate, then making dead electricity conducting areas on ground transistor layer towards vertical space would enable ability to stacked another layer of transistors fuctioning either 2 5D or fully connected.
Or rather some kind of placing another layer of substrate at single transistor accuracy in X and Y axis.
The real way it is made needs explanation, if I haven't missed its concept in this already scientific publication of upcoming product.
For me, the easiest way to imagine 3D stacking is with the use of TSVs, which are literally tiny copper wires drilled into the silicon that connect both chips. It's like a network of pipes.
eh... i don't want socs. that will just make everything more expensive to upgrade and less modular and will remove competition and/or options from some markets (like ram)
True, but at some point, the interconnect becomes the bottleneck unless we can make it optical instead of electrical
Are Nvidia and Apple the last big bastions of huge monolithic chips? Both of those companies seem to have the same attitude to throw money on the problem and let the consumer pay for it. Nvidia and their maxing out chip size for each node. Apple M3 with 3 different design chips reported to cost 1 billion dollars to make. Why did Apple not release a 4-chip M ultra-ultra chip as was rumored for the Mac Pro replacement? Pro machines with 192gig memory are a no-go for many IT pros. (and 256 does not solve it either. 4 chips would at least bump it up to 512 compared to Intels MacPro with 1.5 terra)
Cerebras is banking on wafer-scale chips (entire wafer is one chip) so it can work to have big chips with the right design. You have to be able to fuse off all the bad silicon and/or have enough binning and market margin/demand to make it work. I read on the average nvidia gpu die 10-20% of the silicon is defective. The chiplet strategy is more economical, especially with a simple 2D interposer as on ryzen/epyc. The advanced packaging is much less economical, it would seem easier to do than EUV but if there is any flaw, you are throwing out a lot of chips (no fusing option at this stage). So it really depends on the packaging cost, capacity and reliability as far as how it shakes out in the end.