Due to the Deepseek model, I had to pay Tekboost a visit to grab one of their monster 1.5TB Z8 G4 dual Xeon 18core rigs for around 4K. I need to have that full 128K context and don't care about the slower inference, as long as I get great results with the output.
I used it to copy handwritten text, and it was really amazing, mind-blowing. All the other models either refused to work or just failed to correctly transcript the text and gave me unreadable text.
Mind-blowing that a 72B model can code this well - *and* has vision too! I haven’t looked, but how big is the model file if you were to download it (if you even can)? 72B seems like it could run on something like a 128GB M4 Pro machine. The model companies are all moving slow on the agent front, but this and the R1 revelation is telling me that quite small local “thinking” models will be plenty capable of taking on the personal assistant role, checking email, managing appointments, taking meeting notes, etc, etc. The hangup is the potential screw-ups if they give the assistant full access to your computer and accounts, but pure capability-wise, we’re getting really really close to small, efficient and practical assistant agents running locally. 🤯
When I tried to set it up in Sagemaker I was gonna test it in 128GB. Digits can’t come fast enough for me! Have you tried Operator by chance? I’ve heard mixed things on that but haven’t had a chance to actually try it.
QWEN2.5 72B Instruct has been my daily driver ever since it launched. Bang for the buck in code quality is unmatched. My favorite part is when you get errors in ChatGPT like «Too many concurrent connections» and I'm fully functional offline at home!
Also I do not use the newer 2.5 VL because I believe they had to prune the 2.5 Instruct model to make room for the vison stuff which I do not care for.
@@GosuCoder I read a lot about it all and the vision part is literally another smaller model merged with the main model. I don't fully understand it all but I believe that layers 1 to x are text generation and layers x to y are vision stuff. So they obviously had to prune stuff from the text model to fit the vision part. Also the logic power of the model has to be lower since 72B in the VL is the combined total. I'd bet 100% that the older 72B instruct has more logic power.
The VL is excellent for handwritten transcription and much better than Llama 3.2 Vision 90B. Just a tad below Google Pro Vision and Claude Sonnet. The challenge of handwritten transcription used to be very difficult before 2024.
Due to the Deepseek model, I had to pay Tekboost a visit to grab one of their monster 1.5TB Z8 G4 dual Xeon 18core rigs for around 4K. I need to have that full 128K context and don't care about the slower inference, as long as I get great results with the output.
That is an incredible setup you have there! I'm jealous lol
what if the model quickly gets obsolete and you need much horsepower for the next breakthrough model?
I used it to copy handwritten text, and it was really amazing, mind-blowing. All the other models either refused to work or just failed to correctly transcript the text and gave me unreadable text.
Mind-blowing that a 72B model can code this well - *and* has vision too!
I haven’t looked, but how big is the model file if you were to download it (if you even can)?
72B seems like it could run on something like a 128GB M4 Pro machine.
The model companies are all moving slow on the agent front, but this and the R1 revelation is telling me that quite small local “thinking” models will be plenty capable of taking on the personal assistant role, checking email, managing appointments, taking meeting notes, etc, etc. The hangup is the potential screw-ups if they give the assistant full access to your computer and accounts, but pure capability-wise, we’re getting really really close to small, efficient and practical assistant agents running locally.
🤯
When I tried to set it up in Sagemaker I was gonna test it in 128GB. Digits can’t come fast enough for me! Have you tried Operator by chance? I’ve heard mixed things on that but haven’t had a chance to actually try it.
QWEN2.5 72B Instruct has been my daily driver ever since it launched. Bang for the buck in code quality is unmatched. My favorite part is when you get errors in ChatGPT like «Too many concurrent connections» and I'm fully functional offline at home!
12:55 Yeah, I gotta have API access. I'm not a great programmer but I know how to loop things via the API.
Also I do not use the newer 2.5 VL because I believe they had to prune the 2.5 Instruct model to make room for the vison stuff which I do not care for.
I’m curious where you saw that? It makes me want to test the VL model side by side with original 2.5 now.
@@GosuCoder I read a lot about it all and the vision part is literally another smaller model merged with the main model. I don't fully understand it all but I believe that layers 1 to x are text generation and layers x to y are vision stuff. So they obviously had to prune stuff from the text model to fit the vision part. Also the logic power of the model has to be lower since 72B in the VL is the combined total. I'd bet 100% that the older 72B instruct has more logic power.
The VL is excellent for handwritten transcription and much better than Llama 3.2 Vision 90B. Just a tad below Google Pro Vision and Claude Sonnet. The challenge of handwritten transcription used to be very difficult before 2024.
That’s awesome! I need to test that more.
closed source and no api?
Open weights but having trouble deploying it myself.
So you can actually pay to load Qwen's API as well.
Really? I tried finding that.
i've been trying to connect to it from cline but was unable to.