The array question wasn't passed and there's no room for interpretation because it didn't stick the very basic requirement of A = 0. It even printed to correct assignments of letters to number keys on a phone and so could have seen from its own output that A is not on 0. It's just like answering a "yes" or "no" question with a text wall but not "yes" or "no"! I really hate it when even basic requirements in prompts are completely ignored. But you are right, all these "unscientific" random tests of regular people often tend to gave a much better picture than all those benchmarks for which these LLMs apparently have been optimized for. When it comes to memory footprint though, you shouldn't ignore the context size! I think Ollama uses by default just the ancient value of 2048 which is completely useless for code generation and any reasonable dialog. The very minimum should be 8k, which was also the limit for Llama 3.0. Phi4 4 has a limit of 16k if I remember correctly while even the 8b LLama has now 128k.
Yeah it did get that wrong after all your right. I was imagining a phonepad that doesnt exist for some reason now that I am looking at mine. I did set the context to 16k for it which is its ctx size before testing in its model settings. I just didnt show it in the video.
Thansk, good. Especially the comparisons that companies say 00:16 (we are better than this, we are better than that) with which model and what kind of system promtu they test. So we are waiting for test videos about these performances. How real is the claim of a new LLM? Because in AI training, it can be trained to get good results in those tests. What about the real thing!
Thanks for the test. I am planning to download this weekend for some testing of my own. In your opinion, what model is the best "daily driver" for users with vram aroud 12gb?
Llama 3.2-Vision 11b q4 for that GPU vram size as a DD. Swapped out for Qwen 2.5 coder 7b q8 for specialized code related tasks. Should give you enough for your embeds also to run.
Nice. Have you thought about testing the new mac mini m4 pro? Because it seems like cost efficient? And easy to do since the ollama is optimised for mac Thanks
Could be interesting to use Phi-4 in an agentic setup together with different APIs and real world context or in a simple bot with RAG and VectorDB as data source aka Helpdesk
Appreciate your videos. Newbie to AI, coming from DevOps background and expanding my homelab based on your videos and recommends. Now, just need to sell a kidney to buy a 5090. :-) Running ollama (llama 3.1) on a 4060ti (8GB sad face) currently.
How about getting an older GPU with more vram. I've been fine in the 2070 super max q for stable diffusion. Maybe seconds longer on a large image. Then you can do your thing, upgrade later after they release other versions. If you keep buying the expensive ones that don't quite cut it, together, they kept you from hitting your goal.
Feels like it is missing gaps in knowledge as well. Not sure but also seems to be ultra safety aligned to the point of annoying. Maybe they took it too far?
I am getting the same problems. What do u suggest for maintaing some kind of memory when using llm. Its good if u using the expensive models but locally they struggle when the chat goes really long. Perhaps they're is a way to keep the project going withoutt having to start another chat
I have an RTX 3060 (12gb) and I am looking for a model which can help me with python/web dev/code tasks - do you have an recommendations suitable models to try within openwb-ui?
The array question wasn't passed and there's no room for interpretation because it didn't stick the very basic requirement of A = 0. It even printed to correct assignments of letters to number keys on a phone and so could have seen from its own output that A is not on 0. It's just like answering a "yes" or "no" question with a text wall but not "yes" or "no"! I really hate it when even basic requirements in prompts are completely ignored. But you are right, all these "unscientific" random tests of regular people often tend to gave a much better picture than all those benchmarks for which these LLMs apparently have been optimized for. When it comes to memory footprint though, you shouldn't ignore the context size! I think Ollama uses by default just the ancient value of 2048 which is completely useless for code generation and any reasonable dialog. The very minimum should be 8k, which was also the limit for Llama 3.0. Phi4 4 has a limit of 16k if I remember correctly while even the 8b LLama has now 128k.
Yeah it did get that wrong after all your right. I was imagining a phonepad that doesnt exist for some reason now that I am looking at mine. I did set the context to 16k for it which is its ctx size before testing in its model settings. I just didnt show it in the video.
Thansk, good. Especially the comparisons that companies say 00:16 (we are better than this, we are better than that) with which model and what kind of system promtu they test. So we are waiting for test videos about these performances. How real is the claim of a new LLM? Because in AI training, it can be trained to get good results in those tests. What about the real thing!
Thanks for the test. I am planning to download this weekend for some testing of my own. In your opinion, what model is the best "daily driver" for users with vram aroud 12gb?
Llama 3.2-Vision 11b q4 for that GPU vram size as a DD. Swapped out for Qwen 2.5 coder 7b q8 for specialized code related tasks. Should give you enough for your embeds also to run.
Nice. Have you thought about testing the new mac mini m4 pro? Because it seems like cost efficient? And easy to do since the ollama is optimised for mac Thanks
Could be interesting to use Phi-4 in an agentic setup together with different APIs and real world context or in a simple bot with RAG and VectorDB as data source aka Helpdesk
Yeah that's what I wanted to see
Appreciate your videos. Newbie to AI, coming from DevOps background and expanding my homelab based on your videos and recommends. Now, just need to sell a kidney to buy a 5090. :-) Running ollama (llama 3.1) on a 4060ti (8GB sad face) currently.
🤗 chanting: 8 GB VRAM MODERN GPUS SHOULD NOT BE A THING
@@DigitalSpaceport Ya, I'm kicking myself for not springing for a 4070super. Hoping prices come down and will upgrade.
How about getting an older GPU with more vram. I've been fine in the 2070 super max q for stable diffusion. Maybe seconds longer on a large image. Then you can do your thing, upgrade later after they release other versions. If you keep buying the expensive ones that don't quite cut it, together, they kept you from hitting your goal.
would you have gotten better answers with the phi4:14b-fp16? Thanks for the review of Q8
I doubt it would be double but I will test it this evening or weekend
PHI 4 has difficulty keeping track of context. Overlooks things within the prompts in my testing.
Feels like it is missing gaps in knowledge as well. Not sure but also seems to be ultra safety aligned to the point of annoying. Maybe they took it too far?
I am getting the same problems. What do u suggest for maintaing some kind of memory when using llm. Its good if u using the expensive models but locally they struggle when the chat goes really long. Perhaps they're is a way to keep the project going withoutt having to start another chat
I have an RTX 3060 (12gb) and I am looking for a model which can help me with python/web dev/code tasks - do you have an recommendations suitable models to try within openwb-ui?
Most people don't have 20GB VRAM. Any way this can be put into 16?
yeah run the q4 version and not the 8 and it will fit in 16.
14b can only do so much... You have 3090, this doesn't even qualify to be loaded to the vram