Matt, for each GGUF model listed on HuggingFace there is a black "Use This Model" button. This opens a drop down of providers. Ollama is listed. Clicking that gives the whole "ollama run" command with URL for the model metadata. Also on the right side of each page are links for various Quant sizes. Each of these also has the "Use This Model" button. Pretty handy!
Fantastic news! Of course, I immediately checked it on OpenWeb-UI and had no problem loading one of my experimental huggingface models from the web interface. Very cool.
Thanks for recommenting the Ollama Chrome Extension. It makes life easier. Maybe you can explain how to find great models on HuggingFaces. I just downloaded the famous classic models and have no idea how to benefit from this huge database of AI stuff. Finding your video I first thought, this video brings the answer how to find in HF the right models.
@@technovangelist Yes but keep shifting to it whenever you are speaking about commands. so we don't need to keep in mind each word you say. It helps a lot in understanding the commands. And whomsoever is trying to follow can see and do in parallel. Thus a lot of other Tech & programming tubers do the same with their webcam on side. It's all about viewer perspective. Note: All of the above is in positive feedback. Keep making the good stufff :)
Thank you matt! this is such am amazing way for new people to get into models with ollama! thank you for always making the best ollama content ever! have a good one!
Thank you for point out the caveats to the setup. I appreciate the time savings and not having to learn some of these lessons the hard way. Also, love the PSAs to stay hydrated. Reminds me of Bob Barker telling everyone to spay and neuter their pets.
2x? no. It is much faster than LM Studio could do before and because of supporting that they were able to catch up and go a touch faster, but then you have to deal with that disaster of a UI. It's questionable whether adding that backend would make much difference and it would be a lot of work.
Hi Matt! Thank you for your amazing educational content on AI - it's been a huge help. I'm building an AI agent in N8N on Linux and I'm curious about the practical differences between using NVidia GPUs and AMD GPUs with a Large Language Model like Llama. I've heard NVidia is superior, but what does this really mean in practice? Let's say compare an nvidia 4080 to a AMD 7900xt for example? Your insights would be incredibly valuable, and I'd be grateful if you could share your thoughts on this.
i hope there will be feature to support token streaming model like kyutai moshi (they hasn't release any...) but it will be really cool if we have open source locally model that able do overlap conversation with local AI just like openai advance mode conversation do
Hi Matt, I’ve been trying understand system prompts. I understand these to essentially be prepended to every user prompt. In this video it seems that some models are trained with particular system prompts. Can you suggest a good site/document to read up on this?
they aren't trained with system prompts necessarily, and they aren't prepended to every user prompt. If you are having a conversation with the model, every previous question and answer is added to a messages block. At the top of that is the system prompt. And then all of that is handed to the model, Otherwise the model has no memory of any conversation.
If you import HuggingFace models in Ollama, they are usually beyond slow for some reasons, I think the nature of import just makes then use excessive resources, not the model size. So however interesting the model, it is just a hassle and not worth it. But let me give it a whirl just to make sure, maybe they fixed it.
@@technovangelist I am downloading one and will try it. I might have been unlucky with weird models in the past,, who knows. Thanks for covering this, this is really useful and I prefer Ollama just because I am used to it.
Just to confirm that everything works well, I don't know why converting models in the past made them slow, definitely no longer the issue. Thanks again for great video.
Hi Matt, thank you so much for such great videos. Is there any way I can use the non-GGUF Hugging Face model in Ollama? I want to use the facebook/mbart model for my translation work, but unfortunately, I can't find a GGUF version of it. Additionally, could you please suggest the best model for translation work with the highest accuracy that I can use in Ollama?
Hi do you have the video that elaborate on adding the ollama chat template to the hugging face models. I'm just one step away from getting it running -.-
I have a few that talk about creating the model files from a few months back. Not much has changed there. The new feature in that video was that a 5 min process is now a 30 second process. It’s a convenience.
@ some gguf llm are split in parts. How does it work if I want to create the model file? Am I supposed to merge them first or will it automatically detect?
You can't do safetensors directly like in this video. Ollama does support some of those models, but you have to use the Modelfile approach. I made a short video showing how to do it with one of the HF models - th-cam.com/video/DSLwboFJJK4/w-d-xo.html
how do i download a different version of a gguf model ? often there are various quantization like in QuantFactory/Ministral-3b-instruct-GGUF, how do i download the particular version i want ?
not yet. LM Studio added it recently which has allowed them to catch up to ollama and go past by a couple percent at most. I tried it last night and based on their claims expected mind blowing performance, but it's a tiny improvement over Ollama. Try it.
its been answered a few times elsewhere on the channel. But there are lots of reasons folks don't stay at companies forever. And Ollama is just another company like any other.
Matt, for each GGUF model listed on HuggingFace there is a black "Use This Model" button. This opens a drop down of providers. Ollama is listed. Clicking that gives the whole "ollama run" command with URL for the model metadata. Also on the right side of each page are links for various Quant sizes. Each of these also has the "Use This Model" button. Pretty handy!
Nice. Another new thing. For a long time it felt like ollama was intentionally left out of that list. Thanks for pointing this out
Not all models have the button though
Fantastic news! Of course, I immediately checked it on OpenWeb-UI and had no problem loading one of my experimental huggingface models from the web interface. Very cool.
Thanks Matt! Another very interesting video
Thanks for recommenting the Ollama Chrome Extension. It makes life easier. Maybe you can explain how to find great models on HuggingFaces. I just downloaded the famous classic models and have no idea how to benefit from this huge database of AI stuff. Finding your video I first thought, this video brings the answer how to find in HF the right models.
Whenever there is a command, I would hope to see a terminal with the command on the screen. It is easier to remember if one can see than just hear it.
@@JJJJ-r3u agreee
@@JJJJ-r3u true
Great. That’s why I showed it the first few times
@@technovangelist Yes but keep shifting to it whenever you are speaking about commands. so we don't need to keep in mind each word you say. It helps a lot in understanding the commands. And whomsoever is trying to follow can see and do in parallel. Thus a lot of other Tech & programming tubers do the same with their webcam on side.
It's all about viewer perspective.
Note: All of the above is in positive feedback. Keep making the good stufff :)
Thank you matt! this is such am amazing way for new people to get into models with ollama! thank you for always making the best ollama content ever! have a good one!
Thanks for sharing this breakthrough. Super helpful.
Thank you for point out the caveats to the setup. I appreciate the time savings and not having to learn some of these lessons the hard way.
Also, love the PSAs to stay hydrated. Reminds me of Bob Barker telling everyone to spay and neuter their pets.
This is a great start! That is the single biggest issue I have with ollama, it should not be so complicated to add a custom model in gguf format.
Matt, thank you for your videos and well explanations! greetings from Ecuador! I was able to build so much stuff thanks to you!
Ecuador. One of many places I would love to see. My only stops in South America have been in Venezuela, Argentina, and Uruguay.
if only ollama would add support for a mlx backend, text generation performance would go 2x on macs., while it is already quite good atm.
oh ok it need to support MLX backend from Ollama core ?
2x? no. It is much faster than LM Studio could do before and because of supporting that they were able to catch up and go a touch faster, but then you have to deal with that disaster of a UI. It's questionable whether adding that backend would make much difference and it would be a lot of work.
I learned something new again, so its another great video. ty
Learning more thanks. I like motorcycle repair and maintenance too.
Thanks for this great video!
Nice future, I love ollama ❤
Now I wait for text2image in ollama
Would be great if Ollama had llama 3.2 11B available. Can you ask your friends for an update on their progress?
they are still working on it. there is a reason no other runners have it either
And the model in GGUF?, if it’s not too much trouble, thanks in advance.
Great videos, thank you very much
Please create a video on changing context length in ollama... by default it is 2K only
Also changing other parameters settings will be great.
There are a bunch on here that show that
can you make a video on how to train on your own tweets and then generate bunch of tweets in your style after giving it some new context
I think they're videos about ollama, but they might just be singing for my cat
Hi Matt! Thank you for your amazing educational content on AI - it's been a huge help. I'm building an AI agent in N8N on Linux and I'm curious about the practical differences between using NVidia GPUs and AMD GPUs with a Large Language Model like Llama. I've heard NVidia is superior, but what does this really mean in practice? Let's say compare an nvidia 4080 to a AMD 7900xt for example? Your insights would be incredibly valuable, and I'd be grateful if you could share your thoughts on this.
Asking because i would like to support AMD over its open source approach versus nvidia :)
High end nvidia is better than the best from amd but amd is always cheaper for comparable performance
Thank you. Always interested for a vid on stuff like this! Cheers
i hope there will be feature to support token streaming model like kyutai moshi (they hasn't release any...) but it will be really cool if we have open source locally model that able do overlap conversation with local AI just like openai advance mode conversation do
Thank you and a question, what if the model has several parts, does it support that?
thanks ollama....
Does Ollama have a gui?. lol later in the video you answered my question. 😊
Ollama is text based. There are many guis that run on top but few are as good as the text interface
What is your opinion on nemotron 70b
Which is that front end UI for Ollama in the video?
Hi Matt, I’ve been trying understand system prompts. I understand these to essentially be prepended to every user prompt. In this video it seems that some models are trained with particular system prompts. Can you suggest a good site/document to read up on this?
they aren't trained with system prompts necessarily, and they aren't prepended to every user prompt. If you are having a conversation with the model, every previous question and answer is added to a messages block. At the top of that is the system prompt. And then all of that is handed to the model, Otherwise the model has no memory of any conversation.
@@technovangelist I wrote a simple client using the REST chat API. The results are absolutely cool. Very nice API. Your videos are very helpful.
If you import HuggingFace models in Ollama, they are usually beyond slow for some reasons, I think the nature of import just makes then use excessive resources, not the model size. So however interesting the model, it is just a hassle and not worth it. But let me give it a whirl just to make sure, maybe they fixed it.
not usually. they perform just as well if you get it from hf as if you get them from ollama
@@technovangelist I am downloading one and will try it. I might have been unlucky with weird models in the past,, who knows.
Thanks for covering this, this is really useful and I prefer Ollama just because I am used to it.
Just to confirm that everything works well, I don't know why converting models in the past made them slow, definitely no longer the issue. Thanks again for great video.
Hi Matt, thank you so much for such great videos. Is there any way I can use the non-GGUF Hugging Face model in Ollama? I want to use the facebook/mbart model for my translation work, but unfortunately, I can't find a GGUF version of it. Additionally, could you please suggest the best model for translation work with the highest accuracy that I can use in Ollama?
I think mbart is a different architecture. But many PyTorch and other models can be converted. Review the import docs on the ollama docs
@technovangelist thank you
Hi do you have the video that elaborate on adding the ollama chat template to the hugging face models. I'm just one step away from getting it running -.-
I have a few that talk about creating the model files from a few months back. Not much has changed there. The new feature in that video was that a 5 min process is now a 30 second process. It’s a convenience.
@ some gguf llm are split in parts. How does it work if I want to create the model file? Am I supposed to merge them first or will it automatically detect?
❤
it is supported saftensor model?
You can't do safetensors directly like in this video. Ollama does support some of those models, but you have to use the Modelfile approach. I made a short video showing how to do it with one of the HF models - th-cam.com/video/DSLwboFJJK4/w-d-xo.html
Do you know when llama 4 will be a released?
Nope. Early next year? Late next year?
@technovangelist I can't wait that long :(
how do i download a different version of a gguf model ? often there are various quantization like in QuantFactory/Ministral-3b-instruct-GGUF, how do i download the particular version i want ?
Add the standard quant label as a tag
is it possible to work with any MLX models for run on Apple Silicon faster on GPU ? like ML Studio know to do that
not yet. LM Studio added it recently which has allowed them to catch up to ollama and go past by a couple percent at most. I tried it last night and based on their claims expected mind blowing performance, but it's a tiny improvement over Ollama. Try it.
@@technovangelist thanks for your echange and your videos, great work
Gotta pivot to Otiger
Does it use GPU? I downloaded ministrar 8b and it seemed quite slow
if you have a recent gpu, ollama will support it
Nice video , can you download two models and run togather in ollama
Yes, you can download as many models as you can fit on your machine. Ollama lets you load multiple of them in memory and run them in parallel too
what is the name of gui?
I mentioned it. Pageassist. A chrome extension
Can i download a mlx model and run it on ollama with apple silicon?
You would use mlx for an mlx model.
Are you a tiger whisperer?
Did you get kicked off the team?
its been answered a few times elsewhere on the channel. But there are lots of reasons folks don't stay at companies forever. And Ollama is just another company like any other.
I am so sick of the word model. I hear model model model .... My brain starts to get triggered of this word