M3 Max Benchmarks with Stable Diffusion, LLMs, and 3D Rendering

Matthew Grdinic

มุมมอง 19 849

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 4 ก.ค. 2024
In this video I put my new MacBook Pro with the highest-end M3 Max chip to the test. We'll test out Large Language Model token generation, image creation with Stable Diffusion, and hardware ray tracing with the new Blender 4.0. The result: This is a fantastic machine that users of these tasks will undoubtedly appreciate.
แนวปฏิบัติและการใช้ชีวิต

ความคิดเห็น • 53

@drj9506 7 หลายเดือนก่อน ⁺¹²
First video I’ve seen that actually shows who would need top of the line spec machine. Thanks! Great work
@maxwellwallace8553 7 หลายเดือนก่อน ⁺²⁴
Great video! Nice to see some tests that aren't clearly centered around video/photo editing with the stray cinebench / Blender run.
@Stonk5331 7 หลายเดือนก่อน ⁺³
Finally some decent tests. Thank you :)
@micksee 5 หลายเดือนก่อน
Really love your content Matthew; please keep making more :)
@spillarkscz 3 หลายเดือนก่อน
Good review,to the point.Just what I needed,thx.
@NawafRabah 4 หลายเดือนก่อน ⁺⁵
Hey Matthew, amazing video but I hoped it would've been more visual i.e you show the benchmarks results with the time instead so that we could really see and feel the difference. Now I know what to upgrade to. Thank you
@TheDanteSTV 4 หลายเดือนก่อน
macOS users all time shows only their faces, because on macOS nothing to show except empty words
@yriccoh 7 หลายเดือนก่อน ⁺¹
did you try drawthings with LCM lora and SDXL which accelerates my 14 macbook m1 pro almost x 5 because 4 steps can create quality close to 20 steps before. Also did you try batch sizes with more than one image generated at a time which you should have the RAM for, and does this speed up overall image generation time? thanks for benchmarking Drawthings.
@Levi3D 7 หลายเดือนก่อน ⁺⁷
I skipped to the blender stuff ... you nailed it - I've been looking for some Blender 4.0+ rendering tests using the new M3 hardware raytracing shiz. That looks game changing. Awesome! Please do more on that if you can! Subscribing.
@matthewgrdinic1238 7 หลายเดือนก่อน ⁺²
So glad to have helped and you bet, will make a dedicated Blender video ASAP.
@DerekDavis213 5 หลายเดือนก่อน
M3 Max is impressive in Blender, for a Mac.
But Blender on an NVidia 4080 or 4090 is much faster than Mac. No comparison.
@bhargavapothakamuri4218 5 หลายเดือนก่อน
Thanks! I really need to dig TH-cam for 6 weeks to get to this recommendation.
@stephenthumb2912 6 หลายเดือนก่อน ⁺¹
Thanks for the quick stats. would be nice to know the quants on the llm's. My big question is at 64 and 128 RAM what are the max size models that can be loaded and their token generation speeds.
@wagmi614 หลายเดือนก่อน
can't wait for m4 pro max review
@apollo_I 3 หลายเดือนก่อน
hey, thanks for your great content! I just wonder will M2 Ultra be better? Or the M3 Max is a really big update and can do the great work. 😊
@3dus 7 หลายเดือนก่อน
Was it a LCM (latent consistency model)? It should have been very fast with the neural engine.
@axotical8682 7 หลายเดือนก่อน ⁺¹
Also it is worth mentioning, I discovered Draw Things works on my iPad Pro M1 at speeds that compare with my 1060 laptop. I was astounded…
@JefHarrisnation 6 หลายเดือนก่อน
Thanks for the nfo.
@dougall1687 7 หลายเดือนก่อน ⁺¹
Thanks for the video - one thing I feel a bit unsure about in these evaluations is optimization. About 20 years ago I had to write some super-optimized math code for PowerPC processors. I ended up writing completely different assembler code for 601, 603 and 604 processors because their architectures were differentiated enough that running 601 code on a 604 produced poor results (and vice-versa). Back to today and only in the past few weeks (early Nov 2023) am I starting to see M3 specific code coming through and only then in a few apps - so I wonder how much of what we are seeing right now is compiler lag due to non-optimized generic Apple/ARM CPU models being used to generate code that is run on M3?
@matthewgrdinic1238 7 หลายเดือนก่อน ⁺²
As a former assembly guy myself was pondering the same! From the GPU side Apple released an in-depth video covering the M3's Mesh Shading and Memory improvements - developer.apple.com/videos/play/tech-talks/111375. The key takeaway is developers don't need to change source to see most of these benefits, it's mostly hardware smarts.
Ray Tracing requires code changes but I'd expect those code paths and the feature in general to remain quite stable moving forward.
On the CPU side - it appears most gains over M2 come from higher clocks and standard buffer / width improvements, so a simple recompile, if that, would net most benefits.
The one outlier, in my opinion, is the Neural Engine. As we see in the video it actually *harms* performance on the Max models because the GPU's are so much faster. Aside from a few other cases in say, Metal Upscaling, it sure feels like Apple needs to decide what it's going to do with that thing moving forward. Personally, I'd like Apple beef up the GPU's on all specs, drop the Neural Engine, and use the silicon real-estate for more GPU cores.
@dougall1687 7 หลายเดือนก่อน
@@matthewgrdinic1238 Thanks for the feedback. I did post a longer reply, but youtube deleted it because it contained a link I think. Long story short, as of this morning llvm has CPU targets of apple-m1, apple-m2 and the vague apple-latest but not a specific apple-m3 that I could find. So maybe we'll see some further optimization at the compile level even though as you say, most of the needed tweaks may be handled by hardware - discussion on github though seems to indicate that a lot of common libraries are still not even well m1 optimized.
@richardrick1014 7 หลายเดือนก่อน ⁺²
M3 Max is great, but what's your view on M2 Pro, what's your advice for people that looking to buy Mac Mini, should they wait until the next refresh ? Does the base model Mac Mini M2 Pro 16/512 able to handle Stable Diffusion, Blender, Unreal Engine for learning purposes ?
@matthewgrdinic1238 7 หลายเดือนก่อน ⁺⁶
Funny you should ask, I just got an M3 Pro (12/18). But first: to answer your question strait away yes, it will absolutely handle those uses, but it all comes down to real-world numbers and what you consider an optimal working environment. Hopefully I can help with that.
Based on my testing of the M3 Pro I'd first ask: Do you need hardware Ray Tracing? If yes you want the M3. For example, Blenders Classroom on the M1 renders in 2:59, an M3 Pro 48 seconds. For BMW the M1 turns in a 1:15, the M3 Pro, 19. It's a night and day difference.
For AI - This is a bit more of a wash. Baseline the Neural Engine for all models is 60% faster than the M1 generation and 20% faster than the M2.
In real world terms: with Stable Diffusion (512 x 512, 25 steps), running on just the Neural Engine the M1 Pro renders out in 15 seconds the M3 Pro 8. With GPU *and* Neural Engine both are actually slower, with the M1 turning in an 18 and the M3 a 14. For GPU only the M3 is twice as fast as the M1 but still slower than Neural Engine only, 10 vs 20 seconds.
The rub with Stable Diffusion is GPU is king, so unless you get a Max I wouldn't expect an appreciable jump coming from an M2.
@Michael-fr2xn 3 หลายเดือนก่อน
Hello , very nice video, I have hard time deciding if at the same price should i buy a Macbook Pro 14 with an M2 Max 32Gb Ram or an M3 pro CPU 18Gb Ram? your help will be very appreciate
Thanks 😊
@nidalspam509 7 หลายเดือนก่อน
probably that dynamic caching at work.
@OnelineNight 7 หลายเดือนก่อน
Great video! Thank you for your review, is very helpful. What about UNREAL ENGINE 5 ? Is it working with NANITE and LUMEN? Im so looking forward for that to happen, ive been looking everywhere, but cant find any information... is there any problem with the hardware or is it just the drivers?
@matthewgrdinic1238 7 หลายเดือนก่อน ⁺¹
Looks to be yes as of 5.3 for M2 Macs: (portal.productboard.com/epicgames/1-unreal-engine-public-roadmap/c/1151-support-for-nanite-on-apple-m2-devices), but still a work in progress and unfortunately not yet had the time to try on my M3: forums.unrealengine.com/t/lumen-nanite-on-macos/508411/46 If time permits I'll give it a shot.
@OnelineNight 7 หลายเดือนก่อน
Thank you sir! I hope you can give it a shot!@@matthewgrdinic1238
@Yopyopyop3 หลายเดือนก่อน
All in all, you recommend the m3 Max for blender correct ?
@axotical8682 7 หลายเดือนก่อน ⁺⁴
Thanks for doing this test. Too many unpacking videos and Cinebench pollution on you tube, is hard to find Ai users relevant information.
Could it be maybe the poor results for the M1 have something to do with the amount of ram?
@matthewgrdinic1238 7 หลายเดือนก่อน ⁺¹
On RAM - Possibly: From my testing Stable Diffusion fits quite nicely into 16gb (as do most LLMs). The *problem* is if we have *other* RAM hungry tasks open at the same time. With 16GB this happens quite quickly, even a few browser tabs can push us over the limit and cause severe memory pressure. This is where the extra 2 gigs of the M3 Pros helps, and if you can spare, 24 or higher. Of course thank you for the kind words and totally agreed on the generic tests!
@enriqueFelix2000 5 หลายเดือนก่อน
Can you do m3 test on zbrush?
@DailyProg 5 หลายเดือนก่อน
Can you please cover MLX and Apple software stack for AI please?
@nadred5396 5 หลายเดือนก่อน
how fast is the 4090?
@simply6162 7 หลายเดือนก่อน
Wow could pls test power consumption compared to laptop Rtx while doing those benchmark too?
@matthewgrdinic1238 7 หลายเดือนก่อน
Unfortunately I don't have a laptop 3080, but for Stable Diffusion my desktop 3080 consumed 105 watts compared to the M3 Max's 33. During this test the M3 generated it's image in 11 seconds vs. the 3080 at 5. So about twice the performance for three times the power.
@simply6162 7 หลายเดือนก่อน
@@matthewgrdinic1238 if it was rtx 4080 it would be the same power for twice speed than m3
@matthewgrdinic1238 7 หลายเดือนก่อน
The 40x series are *fanatic* cards for AI. Also, due to time constraints I wasn't using the TensorRT Extension, I've since added it and generation times are essentially halved on my 3080. Apple has a ways to go before they're truly competitive, but this machine is a step in the right direction.
@hubby_medical5454 7 หลายเดือนก่อน ⁺³
I have an m3 max with 36 gb of ram. And easily max it out when using tools like fooocus. Would definitely recommend more than 36gb if you do work like that. I’m personally going back to get 48gb or even more 😅
@psysword 6 หลายเดือนก่อน ⁺²
14 inch?
@DerekDavis213 5 หลายเดือนก่อน
You are going back for more memory? Can you upgrade the memory yourself?
@zmeta8 4 หลายเดือนก่อน
if the random seed was not pinned down, you could get different generation between runs.
@petehboy2 วันที่ผ่านมา
yeah
@akkinoume 7 หลายเดือนก่อน ⁺¹
Can i get mor details of your PC' spec?
@matthewgrdinic1238 7 หลายเดือนก่อน
You bet! Nothing particularly special. Several years old now but still a great machine. pcpartpicker.com/list/8GKQfy
@robertYoutub 6 หลายเดือนก่อน
One thing you left out, is memory usage. Apple may have a problem here, but GPU usage in rendering is quickly running out of memory, even with 64 GB of Ram.
@_codegod 3 หลายเดือนก่อน
Nice, but I think I'll wait few more generations of M chip till it becomes faster for Stable Diffusion. 768^2 per image for 16 seconds is slow imo. SDXL 1.0 is good for 1024^2 images which is what I prefer as my lower threshold of resolution. Although I think you should test LCM SDXL which takes only 2-8 steps per generations with better quality. It should be 5-10x faster.
@MaxPayne_in 6 หลายเดือนก่อน
Rather than providing the duration it takes, it would make more sense to share the it/s
@jbdh6510 4 หลายเดือนก่อน
a comparism to an PC with an RTX 4080 would be nice
@rpaulseymour 7 หลายเดือนก่อน
Odd. Aren’t the M3 Neural Engines supposed to be 60% faster than the previous generation? Great vid, as these new technologies are not so cut and dry as things were even 5 years ago.
@matthewgrdinic1238 7 หลายเดือนก่อน ⁺⁴
Great point - I ran a few more tests against Stable Diffusion XL via the Diffusers app to clear things up - the key here is this app has the ability to set which compute models you want to use. And so, for Neural Engine only the M3 took 9.3 seconds while consuming 5 watts, the M1 17 seconds using 4 watts. For GPU only the M3 posts 4.7 seconds using 35 watts, the M1 14.8 using 17 watts.
@rpaulseymour 7 หลายเดือนก่อน
Wow, those sound like some huge gains. At first, I felt “non-buyer’s” remorse for waiting for these new machines and not buying the M2 Pro or Max beginning of the year. But, these tests seem like the wait was worth it, and now is the time. Also, my wallet said I needed to wait anyway 😅 Thank you!
@fuckdubs 3 หลายเดือนก่อน
do you think 48gb of ram is enough for this sort of work? ive just purhcased one but am concerned i shouldve gone for 64gb despite my spending limit, thanks for any help

ต่อไป

เล่นอัตโนมัติ

Apple Silicon is back, but for which developers?