I'd like to see analysis of cache subsystem. Because AMD Zen 4 uses 32 kB instruction L1$ ....... while Apple M1/M2 uses monstrous 192 kB L1i$ ..... which is 6x fold more!!! Intel went from 32 kB to 48 kB ..... so 1.5x fold more than AMD ....... but still Apple has 4x fold more. It looks like Apple discovered at micro-architecture level something nobody else did.
@@tanthokg There must be reason why x86 is stuck at 32+32 kB L1 cache for 20 years (since 2003 Pentium-M Banias) while Apple multiplied size of L1 to record breaking size in just few years. Apple M2 has ........ 54% higher IPC than Zen 4 and Raptor lake (54% higher performance at same GHz in Geekbench 6). That's why Apple M2 has similar performance at 3.5 GHz like x86 at 5.5 GHz. Maybe because M2 has 6x ALU units .... Zen 4 only 4x ALU. Maybe because M2 has 2x Branch units ... Zen 4 only 1x Branch Unit Maybe because M2 can execute 6+2= 8 instructions per clock .... Zen 4 only 4+1= 5 instr / clock. Have you heard about Zen 5 rumors having huge 30% IPC increase? Well, this still wouldn't be enough to beat an old 2022 Apple M2. And M3 will be released this fall so Zen 5 will need to fight M3 and M4. And cheap license ARM core Cortex X4 has 33% higher IPC than Zen 4 and Raptor Lake. X4 also has record breaking architecture: it has 8xALU units + 3x Branch units = 11 instr / cycle. That's world record even the L1 cache is just 64 + 64 kB.
@@D0x1511af Same way as wide AMD K7 Athlon XP 2100+ at 1.53 GHz lagged behind high speed demon Pentium 4 at 2.4 GHz. We all know from history that high speed demon is the wrong way and Intel lost so much market in favor of AMD. Intel finaly abandoned P4 in favor of wider and lower clocked Core2Duo (even Apple switched from PowerPC to Core2Duo at that time). This time all x86 CPUs are just as high speed power-hogs as P4 was (Intel and AMD both consume >200W which is crazy). The right philosophy like Core2Duo is Apple CPU today. M2 might be lagging small percentage in overall performance however it is multiple times better in everything else (56% higher IPC than Zen 4 means at least 5 years advantage in architecture, also power consumption has M2 Ultra CPU around 80W). Do Intel or AMD have last level cache shared with GPU? No. Apple has L3 cache shared with GPU and NPU. ARM Cortex X4 has L4 cache also shared for all SoC processors. And it goes on and on.... Intel and AMD are dinosaurs who cannot keep the pace with ARM. They employs hundreds of engineers to invent sophisticated predictors to be able mitigate that horrible CISC x86 variable length encoding (while ARM do not spend single minute or single transistor on that because ARM is modern RISC by design - every instruction has fixed length 4 bytes). RISC was invented after CISC for a reason (1984 Intel's RISC i860 was designed after 1978 x86, same for AMD their first inhouse CPU was Am29000 based on RISC, AMD glued x86 decoder and 1st AMD's inhouse x86 was born - K5). RISC is better. Every new ISA in last few years is RISC based (ESP32) or even Chinese 64-bit Loongson is RISC. When CPU does not need run as much legacy SW as x86 does, then you always choose superior RISC. It has more performance and less power consumption at the same time (thanks to modern ISA w/o old garbage). And emulate x86 SW if you really need. That' exactly what Apple does now.
Your channel is very under rated as you do too much detailed and technical analysis, It is hared to understand and appreciate your effort, but you has a loyal and curious fan base
Perhaps Apple's thinking is that E cores are for background tasks and main tasks when in low power mode? If you think about it, the E-core should be capable of greater throughput per die area than P-cores. Once all important apps know how to distinguish P and E cores, and are properly multi-threaded, then future processors should have 2 or 4 at most P cores, and very many E cores depending on model.
i think the whole point of the e-cores in the first place was to be able to have an efficient cpu core for use during standby. x86 cpus are notoriously awful at that sort of thing and even intel's e-cores haven't solved that because that's not what they were meant for.
So first great video as always, now I would like to suggest an idea about doing a mini series about the basics of silicone design, your video are great but most of us may not understand what an AMX do or thing like that. Again thanks for all the great content you provide on this channel and have a good day
Quality vid! One minor correction is that the M2 series uses the 5NP process not the 5N process that M1 used. It’s slightly smaller and more power efficient
I think I say „5 nanometer“ and „N5P“ at different points in the video. Both is correct (if you disregard that it’s not actually 5nm at scale), but as you mentioned, N5P is the specific process used, so technically more correct ;)
I wish they had 2 lineups, one CPU focused and another GPU focused. The higher models having a ton of GPU cores only makes sense for Video, 3D, etc... I wonder why they decided to go so heavy on GPU cores.
I see an “AMX” block ALSO marked at the four EFFICIENCY cores. I count 2 AMX cores. Correct? Also, does anyone know which ARM 8.x ISA Apple is using for the M2 Pro and M2 Max? I’ve read discussions that Apple can’t possibly be using ARM-approved AMX instructions “because AMX instructions are part of ARMv9 which these (Apple) chips don’t yet support.” Yet I’ve ALSO read that ARMv9 isn’t really a “full numeral” ISA upgrade, but more of just an incremental bump of ARMv8.x - and although there may be no AMX instruction support in ARMv8.0, MAYBE AMX instruction support exists in ARMv8.3 or v8.5, and these “non-v9” ISAs >DO< in fact support ARM standard AMX instructions. The Apple M2 line is said to use the ARMv8.5-A ISA, that is BELIEVED to include ARM-approved AMX instructions. (?) Apple MAY be using AMX instructions at the OS level now, but they aren’t yet exposed to developers outside of broad Accelerate framework calls in the SDK.
I am super disappointed in the lack of AV1 support for the media engine and HW ray tracing for the GPU. At least they added HDMI 2.1. But if they were able to change that between the M2 and M2 Pro/Max, why couldn't they have added ray tracing support as well?
I think Apple is in their own AAC+HEVC camp and would like to remain there too. Maybe AAC is the reason they don't want to give any ground to AV1 (with Opus as audio codec?)
@@meru_lpz I doubt it. All the NPU does is accelerate matrix math. If RT acceleration could be done that way, then Nvidia/AMD/Intel wouldn't even bother with RT cores, just shaders and tensor cores would suffice.
I wonder how the layout of the 10 core M2 Pro will look? Is it 6P cores and 4 E cores? Also how the 16 core GPU is laid out or if it is simply a cpu binning limitation?
Good question! I think you are most likely correct with 6P + 4E, since 8P + 2E would be very similar in performance and the P-cores are much larger and thus more prone to silicon defects.
Man... you have the best analysis on the M2 calling it an "evolution" and yes i agree withe next 3nm based M3 silicon to be a "revolution" . I also love you die analysis.
What do you think about the reports/leaks/rumours about that the A16 should’ve had a new kind of GPU with Ray-Tracing cores but that this didn’t work out because of higher power draw in real life than expected. Do you think the M3 of M3 Pro/Max will get RT cores?
Is it known how Apple use their E cores? Are they used for low priority stuff & OS tasks rather than adding to the main core pool for active applications?
Nice to see an update but I'm personally waiting for whatever M series will be on N3E from TSMC. Power efficiency is most important to me and thats where you will find it.
*Can't wait to see die shots of the M2 Max. I want to see if the chip has more than one fusion interconnect to allow more than 2 Max chips to connect to each other. This will allow chips like the rumoured M2 Extreme which is four M2 Max chips stuck together.*
A die-shot pf M2 Max is in the video, but as mentioned, Apple most likely photo shopped it to remove the D2D interconnect. I'm also curious about the rumored M2 Extreme. How do you think connecting four M2 Max would work? Like a daisy chain? Latest rumors say it got cancelled.
@@HighYield exactly. Apple's die shot is edited. Yes it would work like a jigsaw if the M2 Max has two D2D interconnects. M1 Max D2D interconnect was discovered before apple announced it ever had one. It was discovered by real die shots as soon as the M1 Max was released.
Question, I have the M2 Max, and I was just curious if both the high efficiency and performance cores are used together when a single app is doing something high demanding, say processing and rendering After Effects video?
Idk if M2 will wow us like the M1 before it, but it'll be interesting to see how it fares with AMD and Intel who have been trying to remain competitive and take over the performance crown against Apple. Perf/watt less so, but maybe Phoenix might come close? Although this feels like an Alder -> Raptor lake improvement, which is multi-threaded tasks and multitasking.
both intel and AMD have already taken performance crown in laptop chips in some cases even more than double the performance of M1 pro. what they(intel and AMD) are now trying to do is make gains in performance per watt scenario. they've come close especially AMD 7000 laptop chips and Intel ratptor lake at 45 watts. but still Apple's chips have that performance per watt advantage. Apple knows that from this point onwards they cannot compete with both AMD and Intel in RAW performance so this time around in their Presentation they didn't compare it with either of those companies' latest chips rather only claimed Performance per watt. In another Graph Apple only compared their chips' RAW performance with a 5 year old intel Chip.
Excellent analysis dude, proper deep dive as expected from your channel👍 I think changes in the Max version in comparison to the lower tier are mainly because of the scaling issue with the M1 with Ultra. Apple obviously has addressed the mistake.
Do you expect the single core performance on the m2 pro 12 core to be better than the 10 core? It's counterintuitive but not fully impossible but i haven't found any information yet.
I got my M2Max 38GPU / 96G MBP and been testing it with ML workloads. I'm seeing better GPU utilization with createml training and a slight boost in CPU usage. The tweaks Apple made to GPU scheduling, CPU and ANE are showing real gains.
@@seanwfindley for createml it's running well. For tensorflow-metal it's utilizing the GPU nicely. Wish ANE was used for training and not just when you run coreml models.
@@seanwfindley I wish there was a good tool to really measure GPU scheduling and utilization. That way I can gather data when I benchmark my macs. I'm sure apple has some in-house tools, but I don't think they'll ever give it to the public :(
@@seanwfindley if you're just doing pricing models, I don't believe you need to use Tensoflow or pyTorch. Regular regression or stats software like Mathematica is "probably" sufficient. If you need a lot of fast memory to analyze data and use GPU, a maxed out M2Max might be a fit. You'd have to try it. If there's anything I've learned in 20 years of programming, nothing beats a real test to validate whether my assumes are correct or complete garbage :) I know neural networks is the HOT thing, but there's plenty of old boring techniques that get the job done.
Could you please elaborate on why area occupying under GPU core no. 20 is “clearly not GPU core”? How one could even argue it’s the case from photos only?
The pictures or die-shots show the silicon structures on the chip and while we can't see through the chip, some structures are very easy to recognize, including the GPU cores. I've made another mock-up for you, I think it should be very clear in this picture: i.imgur.com/KASpdAh.png Can you see all the other GPU cores? And the difference to the silicon structures where we would expect core #20? The GPU area is bright compared to the rest of the M2 Pro chip, GPU misc & interconnect areas are grey scale and I have marked one complete GPU core + the area which is not a GPU core.
@@HighYield I think they won't release M3 before A17 if it is based on it, because they would need all the 3nm silicon available for their iPhones which is their biggest business. However if M3 is based on A16 then they might release it before.
@2:41 - "the Neural Engine is nothing to upgrade over". Really? My apps are bound by the speed of my M1 Mac Mini Neural Engine. Basically, I run OCR all day long. As far as I can tell, OCR runs on the Neural Engine. However it's hard to be sure because neither the MacOS "ps" command nor the MacOS Activity Monitor tell me anything about the load on the Neural Engine.
@@HighYield OCR is Optical Character Recognition. I'm using Apple's utilities in Apple's Shortcut App to read words on the screen. Yes, maybe I am an exception, but all I'm doing is running Apple's software.
You can't. The RAM is ball soldered directly onto the SoC and the storage isn't conventional M.2 storage but rather discrete flash chips controlled by the SoC's storage controller. You have to spec out what you need when you purchase. There are a couple of experimental attempts at aftermarket upgrades but they require specialized tooling.
@@andraslibal I mean depending on workload, you'll probably be fine. MacOS is quite efficient with Apple's own hardware. I've yet to find anything that my M2 Max/32 GB/1 TB really struggles to do.
@@Longlius I do scientific simulations and I wrote my own code for them it is a fine laptop for it of course the heavier workload is on desktops and computational servers. It is also fine for presentations etc. I just wonder how many years before I need to upgrade I think the cycle is getting shorter.
I have just one question you didn't voice: how tf do they yield these chips with no extra GPU/CPU cores for binning? Then I find it funny how they throw money at engineers to redesign a GPU unit for space. Not in a bad way. Just... "unusual" :)
They don't have to have unused silicon parts to bin their chips? If the fully enabled die has 19 cores they can bin 3 cores and make a one tier down SKU with 16 cores (which is what they do, there's a config of the M2 Pro with 10 CPU cores instead of 12 and 16 GPU cores instead of 19). Likewise with M2 Max, while they don't bin down the CPU on that there's 30-core GPU option and 38 at the higher end.
The asymmetrical design and prime numbers of gpu cores are chosen to prevent resonances. We don't want another Tacoma Narrows to happen with all that data sloshing back and forth
Whenever I see the Max die I get sad. So much space wasted on a gpu cores I will never need, and so little used for the pcores I could use. It is the wrong mix for my workflow.
GPU benchmark tests were performed, comparing the GPU of the M2 MAX and the Nvidia 4070. The result would have been superior for the Nvidia model, but it is necessary to consider energy efficiency and software optimization. I believe that the path of energy efficiency, software optimization will bring better results in the big picture.
What I got from this analysis: - Apple new chip 2.1 ready 8K gaming capable - AMD new presentation gonna perform faster - AMD new drivers set called Applenaline to avoid HW conflicts - Intel 14gen i9 14900 Pro and Maxx on latest Intel7+++ with stock Cryo cooler surpass every competition - Gpu-info become confused with M1 Max to identify as RTX4090 - AMD, Intel and Nvidia new chips feature either 19, 38, 57 or 76 compute units, maybe even 99 to be trendy Just curious: - if new stuff here has support of USB :) - if it boosts 6GHz - if Apple plans to make 3D refresh ( I will wait for it) with AMD unique Lobotomy cache design This Apple hardware is "rocket science" to me, but your video is great as always.
Max Tech included parts of my analysis in their video 🥳 th-cam.com/video/HuR7tL2eejw/w-d-xo.html
I'd like to see analysis of cache subsystem.
Because AMD Zen 4 uses 32 kB instruction L1$ ....... while Apple M1/M2 uses monstrous 192 kB L1i$ ..... which is 6x fold more!!!
Intel went from 32 kB to 48 kB ..... so 1.5x fold more than AMD ....... but still Apple has 4x fold more.
It looks like Apple discovered at micro-architecture level something nobody else did.
@@richard.20000 I read somewhere and it says that they have such large L1$ because they chose to allocate such large area for the cache
@@tanthokg There must be reason why x86 is stuck at 32+32 kB L1 cache for 20 years (since 2003 Pentium-M Banias) while Apple multiplied size of L1 to record breaking size in just few years.
Apple M2 has ........ 54% higher IPC than Zen 4 and Raptor lake (54% higher performance at same GHz in Geekbench 6). That's why Apple M2 has similar performance at 3.5 GHz like x86 at 5.5 GHz. Maybe because M2 has 6x ALU units .... Zen 4 only 4x ALU.
Maybe because M2 has 2x Branch units ... Zen 4 only 1x Branch Unit
Maybe because M2 can execute 6+2= 8 instructions per clock .... Zen 4 only 4+1= 5 instr / clock.
Have you heard about Zen 5 rumors having huge 30% IPC increase? Well, this still wouldn't be enough to beat an old 2022 Apple M2. And M3 will be released this fall so Zen 5 will need to fight M3 and M4.
And cheap license ARM core Cortex X4 has 33% higher IPC than Zen 4 and Raptor Lake. X4 also has record breaking architecture: it has 8xALU units + 3x Branch units = 11 instr / cycle. That's world record even the L1 cache is just 64 + 64 kB.
@@richard.20000 on paper..but actual real performance...M2 still lag behind intel aderlake
@@D0x1511af Same way as wide AMD K7 Athlon XP 2100+ at 1.53 GHz lagged behind high speed demon Pentium 4 at 2.4 GHz. We all know from history that high speed demon is the wrong way and Intel lost so much market in favor of AMD. Intel finaly abandoned P4 in favor of wider and lower clocked Core2Duo (even Apple switched from PowerPC to Core2Duo at that time).
This time all x86 CPUs are just as high speed power-hogs as P4 was (Intel and AMD both consume >200W which is crazy). The right philosophy like Core2Duo is Apple CPU today. M2 might be lagging small percentage in overall performance however it is multiple times better in everything else (56% higher IPC than Zen 4 means at least 5 years advantage in architecture, also power consumption has M2 Ultra CPU around 80W).
Do Intel or AMD have last level cache shared with GPU? No. Apple has L3 cache shared with GPU and NPU. ARM Cortex X4 has L4 cache also shared for all SoC processors.
And it goes on and on....
Intel and AMD are dinosaurs who cannot keep the pace with ARM. They employs hundreds of engineers to invent sophisticated predictors to be able mitigate that horrible CISC x86 variable length encoding (while ARM do not spend single minute or single transistor on that because ARM is modern RISC by design - every instruction has fixed length 4 bytes).
RISC was invented after CISC for a reason (1984 Intel's RISC i860 was designed after 1978 x86, same for AMD their first inhouse CPU was Am29000 based on RISC, AMD glued x86 decoder and 1st AMD's inhouse x86 was born - K5). RISC is better. Every new ISA in last few years is RISC based (ESP32) or even Chinese 64-bit Loongson is RISC.
When CPU does not need run as much legacy SW as x86 does, then you always choose superior RISC. It has more performance and less power consumption at the same time (thanks to modern ISA w/o old garbage). And emulate x86 SW if you really need. That' exactly what Apple does now.
Your channel is very under rated as you do too much detailed and technical analysis, It is hared to understand and appreciate your effort, but you has a loyal and curious fan base
He is aiming for high yield instead of high volume
Ya may be he should open a website or a telegram group where he share with us stuff for futher analysis for his curios subscribers
@@rishavpapaji5349 That's what patreon is for ;)
HDMI 2.1 and WiFi 6E are the only things I’m jealous about and honestly should have already been on the m1 MacBooks
Wow this channel needs more attention!!!! I learned A LOT from 1 video about an Apple SOC. Cheers!!!
That 19 cores is odd, Im curious what the reason is, and I can't wait for your next video!
Perhaps Apple's thinking is that E cores are for background tasks and main tasks when in low power mode? If you think about it, the E-core should be capable of greater throughput per die area than P-cores. Once all important apps know how to distinguish P and E cores, and are properly multi-threaded, then future processors should have 2 or 4 at most P cores, and very many E cores depending on model.
i think the whole point of the e-cores in the first place was to be able to have an efficient cpu core for use during standby. x86 cpus are notoriously awful at that sort of thing and even intel's e-cores haven't solved that because that's not what they were meant for.
So first great video as always, now I would like to suggest an idea about doing a mini series about the basics of silicone design, your video are great but most of us may not understand what an AMX do or thing like that.
Again thanks for all the great content you provide on this channel and have a good day
Quality vid! One minor correction is that the M2 series uses the 5NP process not the 5N process that M1 used. It’s slightly smaller and more power efficient
9:20 I think he said "N5P", unless he misspoke somewhere else in the video.
@@NootNoot. he corrected himself later in the video. But I hadn’t gotten to that part yet.
@@chidorirasenganz Haha, all good
I think I say „5 nanometer“ and „N5P“ at different points in the video. Both is correct (if you disregard that it’s not actually 5nm at scale), but as you mentioned, N5P is the specific process used, so technically more correct ;)
underrated analysis
Thank you ! This is what I was looking for ! I’m super exited upgrading from a 2017 Intel maxed out MacBook Pro to a M2 Max MacBook Pro 💻 🤩
great video! really love the die shot analysis.
I wish they had 2 lineups, one CPU focused and another GPU focused. The higher models having a ton of GPU cores only makes sense for Video, 3D, etc... I wonder why they decided to go so heavy on GPU cores.
Maybe the increase use of 3D in different creative industries? And maybe the increasing gpu rendering, that would be my guess
I see an “AMX” block ALSO marked at the four EFFICIENCY cores. I count 2 AMX cores. Correct?
Also, does anyone know which ARM 8.x ISA Apple is using for the M2 Pro and M2 Max? I’ve read discussions that Apple can’t possibly be using ARM-approved AMX instructions “because AMX instructions are part of ARMv9 which these (Apple) chips don’t yet support.” Yet I’ve ALSO read that ARMv9 isn’t really a “full numeral” ISA upgrade, but more of just an incremental bump of ARMv8.x - and although there may be no AMX instruction support in ARMv8.0, MAYBE AMX instruction support exists in ARMv8.3 or v8.5, and these “non-v9” ISAs >DO< in fact support ARM standard AMX instructions. The Apple M2 line is said to use the ARMv8.5-A ISA, that is BELIEVED to include ARM-approved AMX instructions. (?)
Apple MAY be using AMX instructions at the OS level now, but they aren’t yet exposed to developers outside of broad Accelerate framework calls in the SDK.
I am super disappointed in the lack of AV1 support for the media engine and HW ray tracing for the GPU. At least they added HDMI 2.1. But if they were able to change that between the M2 and M2 Pro/Max, why couldn't they have added ray tracing support as well?
I think Apple is in their own AAC+HEVC camp and would like to remain there too. Maybe AAC is the reason they don't want to give any ground to AV1 (with Opus as audio codec?)
@@VADemon Apple is part of the Alliance for Open Media, the group who created AV1. So it's not a "give ground to" thing
@@kirby0louise Maybe not, but in a similar issue Apple added WEBM (VP9) support *looks up* only 2 years ago.
They could probably use the NPU to accelerate ray-tracing
@@meru_lpz I doubt it. All the NPU does is accelerate matrix math. If RT acceleration could be done that way, then Nvidia/AMD/Intel wouldn't even bother with RT cores, just shaders and tensor cores would suffice.
Very good analysis. I was disappointed by Gary Explains video he didn't mentioned anything which is already known.
do you remember the roumor which said apple removed ray tracing cores from A16 , maybe those odd gpu spaces are because of that decision change.
Claimed TFLOPs went from 5.3 to 6.8, so clockspeed and 1-2% architecture improvement.
I wonder how the layout of the 10 core M2 Pro will look? Is it 6P cores and 4 E cores? Also how the 16 core GPU is laid out or if it is simply a cpu binning limitation?
Good question! I think you are most likely correct with 6P + 4E, since 8P + 2E would be very similar in performance and the P-cores are much larger and thus more prone to silicon defects.
When talking about the GPU, you were looking for missing 19th core. It's indexed correctly from 0. Having core num 20 means there will be 21 cores.
I was looking for the core #20, which would have been labeled as "core 19". There are only 19 physical cores, core0 to core18.
Do you have any ideas why Apple went with 5nm instead of 4nm, based on A16?
Man... you have the best analysis on the M2 calling it an "evolution" and yes i agree withe next 3nm based M3 silicon to be a "revolution" .
I also love you die analysis.
I am so glad to have found your channel! 😊
What do you think about the reports/leaks/rumours about that the A16 should’ve had a new kind of GPU with Ray-Tracing cores but that this didn’t work out because of higher power draw in real life than expected. Do you think the M3 of M3 Pro/Max will get RT cores?
Is it known how Apple use their E cores? Are they used for low priority stuff & OS tasks rather than adding to the main core pool for active applications?
Nice to see an update but I'm personally waiting for whatever M series will be on N3E from TSMC. Power efficiency is most important to me and thats where you will find it.
Apple will find a way to avoid increase in battery life and faq up thermals anyway.
@veled veled i don't think so
Loved the deep dive into the apple architecture. I may not always agree on apples marketing decisions but their technology is facinating!
*Can't wait to see die shots of the M2 Max. I want to see if the chip has more than one fusion interconnect to allow more than 2 Max chips to connect to each other. This will allow chips like the rumoured M2 Extreme which is four M2 Max chips stuck together.*
A die-shot pf M2 Max is in the video, but as mentioned, Apple most likely photo shopped it to remove the D2D interconnect.
I'm also curious about the rumored M2 Extreme. How do you think connecting four M2 Max would work? Like a daisy chain? Latest rumors say it got cancelled.
@@HighYield exactly. Apple's die shot is edited. Yes it would work like a jigsaw if the M2 Max has two D2D interconnects.
M1 Max D2D interconnect was discovered before apple announced it ever had one. It was discovered by real die shots as soon as the M1 Max was released.
Question, I have the M2 Max, and I was just curious if both the high efficiency and performance cores are used together when a single app is doing something high demanding, say processing and rendering After Effects video?
20th core is most probably task dispacher, or some kind of hub/cross bar to cache/memory
Amazing work!
thanks for the nice break down. I'm hoping they improved the GPU architecture so it scales better with better scheduler/memory management.
Idk if M2 will wow us like the M1 before it, but it'll be interesting to see how it fares with AMD and Intel who have been trying to remain competitive and take over the performance crown against Apple. Perf/watt less so, but maybe Phoenix might come close? Although this feels like an Alder -> Raptor lake improvement, which is multi-threaded tasks and multitasking.
both intel and AMD have already taken performance crown in laptop chips in some cases even more than double the performance of M1 pro. what they(intel and AMD) are now trying to do is make gains in performance per watt scenario. they've come close especially AMD 7000 laptop chips and Intel ratptor lake at 45 watts. but still Apple's chips have that performance per watt advantage.
Apple knows that from this point onwards they cannot compete with both AMD and Intel in RAW performance so this time around in their Presentation they didn't compare it with either of those companies' latest chips rather only claimed Performance per watt. In another Graph Apple only compared their chips' RAW performance with a 5 year old intel Chip.
Excellent analysis dude, proper deep dive as expected from your channel👍
I think changes in the Max version in comparison to the lower tier are mainly because of the scaling issue with the M1 with Ultra. Apple obviously has addressed the mistake.
The video I was waiting for 👀
I love that nerdy review thank u
Best content as always thanks
Thanks!
BTW: love your name ;)
Your channel is excellent. I wish it were a stock so I could buy shares now, because I know your subscriber count will soar over the next year.
Your videos are excellent
This was really good
Do you expect the single core performance on the m2 pro 12 core to be better than the 10 core? It's counterintuitive but not fully impossible but i haven't found any information yet.
No, I dont think so, both should have similar ST performance.
For H.265 video editing, any ideas? Also AV1 support.
No AV1 support afaik, HEVC (H.265) is supported tho.
@High Yield Yeah would love it even on a software level for exporting for youtube. Funnily, Apple is one of the founders behind AV1.🤷♂️
Regarding "2022": I'm sure Apple didn't produce this piece in 2023 as it's not even three weeks since New Years..
I got my M2Max 38GPU / 96G MBP and been testing it with ML workloads. I'm seeing better GPU utilization with createml training and a slight boost in CPU usage. The tweaks Apple made to GPU scheduling, CPU and ANE are showing real gains.
how is it handling the ML workloads?
@@seanwfindley for createml it's running well. For tensorflow-metal it's utilizing the GPU nicely. Wish ANE was used for training and not just when you run coreml models.
@@seanwfindley I wish there was a good tool to really measure GPU scheduling and utilization. That way I can gather data when I benchmark my macs. I'm sure apple has some in-house tools, but I don't think they'll ever give it to the public :(
@@woolfel Would you say the new macbook pro could be useful for pricing models or predicting shortages, some based on news/text analytics?
@@seanwfindley if you're just doing pricing models, I don't believe you need to use Tensoflow or pyTorch. Regular regression or stats software like Mathematica is "probably" sufficient.
If you need a lot of fast memory to analyze data and use GPU, a maxed out M2Max might be a fit. You'd have to try it. If there's anything I've learned in 20 years of programming, nothing beats a real test to validate whether my assumes are correct or complete garbage :) I know neural networks is the HOT thing, but there's plenty of old boring techniques that get the job done.
Will they jump directly from 5 nm to 3 nm design with the M3 or will they gradually pass to 4 nm?
So 19 GPU cores-should be INCREDIBLE GAME PERFORMANCE…..??? Finally???
Games apple...
Thank you
What is the likelihood that the asymmetrical design was simply to avoid the number 40, which could be considered unlucky in some areas?
I honestly can't imagine that at all. I think its mostly space efficiency.
Could you please elaborate on why area occupying under GPU core no. 20 is “clearly not GPU core”? How one could even argue it’s the case from photos only?
The pictures or die-shots show the silicon structures on the chip and while we can't see through the chip, some structures are very easy to recognize, including the GPU cores.
I've made another mock-up for you, I think it should be very clear in this picture: i.imgur.com/KASpdAh.png
Can you see all the other GPU cores? And the difference to the silicon structures where we would expect core #20? The GPU area is bright compared to the rest of the M2 Pro chip, GPU misc & interconnect areas are grey scale and I have marked one complete GPU core + the area which is not a GPU core.
The true innovation and evolution of APU is AMD MI300 with 167bn transistors 👍
M1 came immediately after A14, so will M3 come immediately after A17 as well?
Since the 3nm delay scrambled Apples timeline, we might even get M3 before the A17 this time. But ofc this is pure speculation on my side.
@@HighYield I think they won't release M3 before A17 if it is based on it, because they would need all the 3nm silicon available for their iPhones which is their biggest business.
However if M3 is based on A16 then they might release it before.
@2:41 - "the Neural Engine is nothing to upgrade over". Really? My apps are bound by the speed of my M1 Mac Mini Neural Engine. Basically, I run OCR all day long. As far as I can tell, OCR runs on the Neural Engine. However it's hard to be sure because neither the MacOS "ps" command nor the MacOS Activity Monitor tell me anything about the load on the Neural Engine.
Then you are a exception, since most ppl are not stressing the NPU that much.
I’ve never heard of “OCR”, can you tell me what it is?
@@HighYield OCR is Optical Character Recognition. I'm using Apple's utilities in Apple's Shortcut App to read words on the screen. Yes, maybe I am an exception, but all I'm doing is running Apple's software.
How hard is it to later upgrade a M2 Pro Max to 64 GB and 2TB ?
I plan to buy one with 32GB and 1TB and later expand it.
You can't. The RAM is ball soldered directly onto the SoC and the storage isn't conventional M.2 storage but rather discrete flash chips controlled by the SoC's storage controller. You have to spec out what you need when you purchase. There are a couple of experimental attempts at aftermarket upgrades but they require specialized tooling.
@@Longlius sounds quite bad I ended up going for something that was on stock let's see how long it holds up.
@@andraslibal I mean depending on workload, you'll probably be fine. MacOS is quite efficient with Apple's own hardware. I've yet to find anything that my M2 Max/32 GB/1 TB really struggles to do.
@@Longlius I do scientific simulations and I wrote my own code for them it is a fine laptop for it of course the heavier workload is on desktops and computational servers. It is also fine for presentations etc. I just wonder how many years before I need to upgrade I think the cycle is getting shorter.
I need RAM, I need storage, so I’m out, cheers Apple.
The M2 Max goes up to 96GB RAM and 8TB storage. But of course at Apple prices.
No, you need to satisfy your porn addiction. Anything with internet access will do.
Why I feel like it's probably closer to 20% faster for 30% more money
Maybe because Apple likes money? :p
I have just one question you didn't voice: how tf do they yield these chips with no extra GPU/CPU cores for binning?
Then I find it funny how they throw money at engineers to redesign a GPU unit for space. Not in a bad way. Just... "unusual" :)
They don't have to have unused silicon parts to bin their chips? If the fully enabled die has 19 cores they can bin 3 cores and make a one tier down SKU with 16 cores (which is what they do, there's a config of the M2 Pro with 10 CPU cores instead of 12 and 16 GPU cores instead of 19). Likewise with M2 Max, while they don't bin down the CPU on that there's 30-core GPU option and 38 at the higher end.
@@utubekullanicisi Oh didn't know they had these, thanks
These apus are getting strong 💪 apple needs full steam support with proton or Rosetta Stone….
The asymmetrical design and prime numbers of gpu cores are chosen to prevent resonances. We don't want another Tacoma Narrows to happen with all that data sloshing back and forth
Whenever I see the Max die I get sad. So much space wasted on a gpu cores I will never need, and so little used for the pcores I could use. It is the wrong mix for my workflow.
m2 ultra vs 4060ti
Give me one reason not to sub!? This is some quality content if I've ever seen some!
GPU benchmark tests were performed, comparing the GPU of the M2 MAX and the Nvidia 4070. The result would have been superior for the Nvidia model, but it is necessary to consider energy efficiency and software optimization. I believe that the path of energy efficiency, software optimization will bring better results in the big picture.
What I got from this analysis:
- Apple new chip 2.1 ready 8K gaming capable
- AMD new presentation gonna perform faster
- AMD new drivers set called Applenaline to avoid HW conflicts
- Intel 14gen i9 14900 Pro and Maxx on latest Intel7+++ with stock Cryo cooler surpass every competition
- Gpu-info become confused with M1 Max to identify as RTX4090
- AMD, Intel and Nvidia new chips feature either 19, 38, 57 or 76 compute units, maybe even 99 to be trendy
Just curious:
- if new stuff here has support of USB :)
- if it boosts 6GHz
- if Apple plans to make 3D refresh ( I will wait for it)
with AMD unique Lobotomy cache design
This Apple hardware is "rocket science" to me, but your video is great as always.
Help me 🙏🙏🙏🙏🙏🙏🙏🙏
ca$h with dollar signs
I should have used € instead, missed opportunity!