How did you conclude that running the same model was faster via MLX than with the llama.cpp backend? Comparing with llama-3.1 8b 8-bit, I get the same generation speed between LM Studio/MLX and Ollama/llama.cpp (33.6 tok/s on M1 Max 64GB)
Are you loading the same model in different tools? You have to download the MLX model and GGUF versions separately. Then load one at a time and test. MLX is decently faster for me always.
One model is of Dhanush, and the other is of Tamanna. Can they both be prompted together in a single image? If yes, how? Please explain, or if there's a tutorial link, kindly share.
WOW! Brilliant vid. M3 Max currently. What is the largest size model that can run? I can't wait to try this out. I want to train a model for my legal work. Fingers crossed this can help.
Depends on how much ram you have. Look at the models how big they are. You can only use around 70-75% of your ram for vram which is needed to load the entire model.
@@1littlecoder Now tested, very briefly, the lmstudio--community/Qwen2.5-Coder-32B-Instruct-MLX-8bit. So far good results. Nice to be able to do this 'off-line' (on a local machine)
Pls, give Jan AI a try. LM Studio is based on llama cpp, but proprietary closedsource and God only knows what it is doing - mining shitcoins, sending telemetry, collecting your personal data - you'll never know. Jan AI is open source and based on the same llama cpp and gets the same benefits as llama cpp gets
With the presented qwen2-0-5b-instruct model(352.97 MB), It's about twice faster on your M3 max (221 tok/sec) than on my RTX 3090 ( 126 tok/sec) but, with the llama-3.2-3B-4bit model (2.02 GB) speeds are similar on both device. this is probably due to the amout of available vram (24GB on 3090)
yes but how free are you to run a couple of llms at same time? especially if you’re code bouncing.
Thanks for making me aware of the MLX version. Beware: My installed version only updated to 0.2.31 and I had to download 0.3.4 from LMStudio AI!
Can u makea video on Fine-tuning Embeddings and LLMs (also include how to create dataset to train on custom data)
It will be very interesting
Thanks for the idea, Will try to put together something!
How did you conclude that running the same model was faster via MLX than with the llama.cpp backend?
Comparing with llama-3.1 8b 8-bit, I get the same generation speed between LM Studio/MLX and Ollama/llama.cpp (33.6 tok/s on M1 Max 64GB)
Aren’t they both use mlx? Isn’t that the same speed then?
Are you loading the same model in different tools?
You have to download the MLX model and GGUF versions separately. Then load one at a time and test.
MLX is decently faster for me always.
One model is of Dhanush, and the other is of Tamanna. Can they both be prompted together in a single image? If yes, how? Please explain, or if there's a tutorial link, kindly share.
Doesn’t support M4 yet?
I’ve been using it all week on M4.
can we use this in intel mac..?
you can use this, but the mlx bit won't work
what is the difference between this and ollama
WOW! Brilliant vid. M3 Max currently. What is the largest size model that can run? I can't wait to try this out. I want to train a model for my legal work. Fingers crossed this can help.
Depends on how much ram you have. Look at the models how big they are. You can only use around 70-75% of your ram for vram which is needed to load the entire model.
Is there a model for Swift only programming?
Anyone know a decent model for generation of Go code? Like for solving Advent of Code puzzles.
try with qwen coder series of models
@@1littlecoder Thanks for the information! I'll try that on my M4
@@1littlecoder Now tested, very briefly, the lmstudio--community/Qwen2.5-Coder-32B-Instruct-MLX-8bit. So far good results. Nice to be able to do this 'off-line' (on a local machine)
Hey man, this is really great! Thanks. Hopefully Ollama is integrating it. They seem a bit lame past weeks.
Hope so!
Thank I just installed it... nice M3 here
Enjoy the speed!
is M2 8/512 work?
Thanks for the tutorial
Whats your PC spec ?
It’s a Mac so the one titled Mac specs😂
Phenomenal 🤖
Pls, give Jan AI a try. LM Studio is based on llama cpp, but proprietary closedsource and God only knows what it is doing - mining shitcoins, sending telemetry, collecting your personal data - you'll never know. Jan AI is open source and based on the same llama cpp and gets the same benefits as llama cpp gets
But we need mlx support 😢😢😢
@zriley7995 original llama.cpp has it. LM Studio added ZERO to the under-the-hood functionally - just slapped its own UI on top of it
🔥🔵“Intelligence is compression of information.” This is one of the most useful videos I believe I have ever watched on TH-cam.🔵
With the presented qwen2-0-5b-instruct model(352.97 MB), It's about twice faster on your M3 max (221 tok/sec) than on my RTX 3090 ( 126 tok/sec)
but, with the llama-3.2-3B-4bit model (2.02 GB) speeds are similar on both device.
this is probably due to the amout of available vram (24GB on 3090)
Let’s go ahead and say “go ahead” every other sentence
@@SirSalter did I use it to much 😭 sorry
@@1littlecodernah youre perfect. That guy is just grumpy and thats fine :) you rock!
@@judgegroovyman thank you sir ✅
Ollama is excellent. Don’t dis it
@@tollington9414 didn't
awesome, thanks! was looking for this. you could have gotten to the point a bit more, but whatever :D .mlx is the way to go!
You mean gotten to the point sooner ?