I found MLX to be waaaay slower than GGUF on a M4 Max-64Gb I would actually load 70B 4bit with GGUF with decent speeds Tried with LM Studio 0.3.5 Build 2 and 8 and 9 There has to be something we’re missing using MLX - I saw some apple engineering playing around with GPU wiring system values - might be a solution, MLX is supposed to be way way way faster than this!
Just found your channel I love how you break it down for the average user!
Appreciate you!🙏🏽
I found MLX to be waaaay slower than GGUF on a M4 Max-64Gb
I would actually load 70B 4bit with GGUF with decent speeds
Tried with LM Studio 0.3.5 Build 2 and 8 and 9
There has to be something we’re missing using MLX - I saw some apple engineering playing around with GPU wiring system values - might be a solution, MLX is supposed to be way way way faster than this!