How Fast Will Your New Mac Run LLMs?
ฝัง
- เผยแพร่เมื่อ 2 ก.พ. 2024
- How fast can the new Apple Silicon Mac you so desperately want run LLM's and is it worth the price?
llama.cpp benchmarks: github.com/ggerganov/llama.cp...
Ollama: ollama.ai
00;00 Intro
00:47 Benchmarks
05:06 Unbox
05:47 Results
Support My Work:
Check out my website: www.ianwootten.co.uk
Follow me on twitter: / iwootten
Subscribe to my newsletter: newsletter.ianwootten.co.uk
Buy me a cuppa: ko-fi.com/iwootten
Learn how devs make money from Side Projects: niftydigits.gumroad.com/l/sid... - วิทยาศาสตร์และเทคโนโลยี
TG largely depends on memory-bandwidth (the SoC has to pump all of the parameters and the KV-caches from RAM into the SoC's caches for each token generated). PP (and ML) is dependent on compute (GPU-horsepower) because token-processing can be batched.
The M4 has 20% faster memory-bandwidth in addition to the faster GPUs. Let's see when Apple will do MacBooks with these chips, maybe I will upgrade my M2. For me, the M3 is not interesting enough for an M2 upgrade.
Very helpful thank you so much
great deal!
Can you try stablediffusion?
Yessss, and also something like SAM or YOLO too
I want to start experimenting with llms and I have a budget for laptop or pc or a compromise of both , I was going for a great Mac or an ok one and a pc , what’s your advise ?
A lot of it will come down to personal preference. I'm familiar with Macs, really like that they are silent and have great battery. Most of my choice is based on that, the fact they're very good for llms too works in my favour. I'm sure there's some pretty good PCs out there too, and now Ollama works there too.
@@IanWootten yes I like Mac OS much more than windows but my concern was the speed and size of the model , I am concerned with that 16gb of unified memory wouldn’t be enough
Nice
Can you run MistralAI ?
Sure can. I get around 55 t/s. But I could really well on my M1 Pro. I can also run mixtral - I think that's the more interesting one since it's a huge 26GB model and will run at 33 t/s.
@@IanWootten a video about it would be more than worth it, just saying
Pretty impressive, for the same tests, I was getting around 73 tokens/s on a Windows 11 WSL Ubuntu setup with a RTX 4070 Super GPU (AMD CPU)
Oh nice. Thanks for sharing - Definitely think investing in a better GPU for my PC could work out more financially viable if I ever need it.
If I was really concerned about performance I would not buy a laptop. An M2 Ultra MacStudio in the refurb store can be had for $3500.
Sure, I still need portability in this case. You also have limited options - only studio available in my region is £5k
Definitely over value-engineered the M3 Pro, probably using LPDRR5 and PCIe 5.0 clock efficiencies on fewer chips to make-up the differences while increasing profit margins. Also curiously the M3 8gb chips for the Macbook Pro and iPad are the same thing. So also possibly many 8gb Macbook Pros are also possibly iPad chips. Some weird Apple logic going on here. I still think Apple's iPad Pro forecasts for this year were off and they made a product their customers want, a Macbook Pro that does iPad Pro duties.