I'm loving ollama.. it was a breeze to run a 7B model locally in my humble laptop.. If you can show us how to fine-tune a model locally and then use it with ollama, that would be awesome..
@@learndatawithmark I have some data which is bullet points of facts as input and a coherent paragraph from the bullet points as output. There are tons of tutorials but it's too chaotic, each one is using different base model quantization, prompt template, output format.. So, it would be great to create a model which can be run wiith ollama.. If you are interested, I can share the data..
Thank you very much! After reading the documentation and spending 30 minutes asking GPT-4 (like an idiot) on how to do it, I was confused. Looks like I did something wrong with writing the path. Your video is clear and easy-to-understand .
@@boysrcute To ensure the paths are correctly formatted to avoid parsing errors in scripts or tools that may interpret backslashes as escape characters, you should use double backslashes (\\) or replace them with forward slashes (/). You'll want to fix your modelfile path to have forward slashes.
I didn't bother doing anything with that, but there are a lot more options for refining that than there were at the time I created the video. You can see all the options here - github.com/ollama/ollama/blob/main/docs/modelfile.md
Pro tip - use pretty much anything else EXCEPT Ollama, because Ollama demands you only use their version of GGUF files. Every other such software uses normal GGUF you can download yourself. Don't get trapped inside Ollama.
@@EcoTekTermika Only if you mess around creating model files for it and don't mind it being given a Sha hash as a name... which is my point, it's just a HF GGUF, but with extra steps that stop you using the models for anything else
Yeh it's frustrating - would be way better if you could use the HF models directly and they had a separate file for any of the meta data they create. Main reason I end up using Ollama a lot of the time is that I can't find a reliable place to get quantised versions of the models. I use to download them from TheBloke, but he stopped doing them since about January!
Thank you for sharing information 🙂 Is it possible to use Ollama SHA256:... files like to gguf / bin, etc to use it also in eg LM Studio / Autogen, etc or these files are useless outside Ollama due to the fact that is used hashing on them (if so) ?
I'm not sure what a lfs model is? Also you don't have to use Ollama to run models - there's always Hugging Face's transformers library which works with all their models too.
Nope, AFAIK you can't run gguf directly, you always have to convert it to Ollama's format. Other tools for running GGUF files directly are llama.cpp or llamafile in case you haven't heard of them!
asitop says it's the alternative for nvtop, so perhaps one of those functions? "A Python-based nvtop-inspired command line tool for Apple Silicon (aka M1) Macs."
As far as I know, this is still the way to run GGUF models with Ollama. I wish you could use GGUF files directly, it would be so much easier! I haven't used Open WebUI, I'll take a look at that. If you want command line tools that can run GGUF files directly, take a look at llamafile or llama.cpp github.com/ggerganov/llama.cpp github.com/Mozilla-Ocho/llamafile
@@learndatawithmark Thanks for answering. Indeed, my interest is in running them on Ollama, due to the new Open WebUI, which is the most marvelous thing invented. Open WebUI is a frontend to Ollama, presenting an interface like Chatgpt, with history of the conversations, talks with LLMs completely hands free, you talk and listen, and you can input your texts, PDFs, RAG, makes LLMs access internet in real time, upload images, multimodality, it's fantastic. You definitely need to test it. The problem with it is that it's based on Ollama, so, I use LM Studio, with dozens of Hugging face models, and I love them. I would like these models to be used in Open WebUI, but they are in GGUF format, that's why I found your video, in order to use gguf models in Ollama. 🙏👍💥
I don't understand, Ive donwnloaded Ollama, run the first model, then hit : Ollama run dolphin-mixtral:latest But it was too slow, I don't understand all the part you went too, you used huggingface but I just want to run the model and install it I don't know anything about that poetry hugging face part
If you only want to use one of Ollama's built in libraries you don't need to do any of the stuff in this video - you can do what you said. But keep in mind that dolphin-mixtral is one of the biggest models so it will be slower than the other ones. Perhaps try dolphin-mistral to see if that gives better performance.
Good question! I have used cURL a few times, but in the instructions it suggested that you should use the CLI tool. I haven't actually looked at the code to see if/what it does differently to cURL.
A clear example that shows that most nerds lack the skills and the willingness to provide simple solutions for non-nerds, the majority of us humans. It takes a less than a minute to install LM Studio and an AI model and have your code checked, a role played, questions answered, etc in private. Why are guys like you often ignore the regular users, and only focus on fellow nerds?
Hey - Thanks for your feedback. You're right - LM Studio is an easier approach, but I didn't know that it supported GGUF until I read your post. This video was also more about showing how you can use Ollama to run models even if the Ollama folks haven't already added it as a library. To be fair they do now seem to add new models so quickly that you rarely have that situation.
@@learndatawithmark Anaconda is my standard, but maybe I should learn poetry. I think I'm just frustrated that so many tutorials assume familiarity with so many tech stack options.
When I try this I do get the newly added model listed, but when I run it it fails with an error Error: Post "127.0.0.1:11434/api/generate": EOF I have this problem on the the safetensor-coverted-to-GGUF model that I imported into ollama. Other much larger models like 34B Llava run fine. For conversion to GGUF I used ruSauron/to-gguf-bat method on Github. Any ideas where this went wrong? Thanks
Is there anything in the Ollama log file? ~/.ollama/logs/server.log It might also be worth seeing whether you can use the safetensors model directly. I showed how to that here - th-cam.com/video/DSLwboFJJK4/w-d-xo.html Equally it might just be that the model isn't supported by llama.cpp, which is the underlying library that Ollama uses to run inference on the LLMs.
@@learndatawithmark Thanks for the response. I did see your other video on using safetensors in ollama but ran into a roadblock right at the start 😊. I posted a message on that video too with a question on where to find “template” info but for some reason the message doesn’t appear. Template isn’t mentioned on the model page and I am at wits end trying to find it. I skipped template info and needless to say it didn’t work.
@@learndatawithmark I don't see a logs folder in .ollama. All that's in there is history id_ed25519 id_ed25519.pub I'm looking in the root folder. Is it elsewhere? Thanks
I tried another conversion method using llama.cpp and on the last step this is what I got: INFO:hf-to-gguf:Loading model: GOT-OCR2_0 ERROR:hf-to-gguf:Model GOTQwenForCausalLM is not supported You were right, perhaps not every safetensor can be converted to gguf.
You had me 20 seconds into this video - INSTANT FOLLOW! 🔥 MUCH LOVE FROM NEW ORLEANS AND THANK YOU❤
Thanks a lot for this! I was looking for a video that explained this exact topic and you did it in such a simple and efficient way. Kudos!❤
Thank You. Very Useful and Very Timely!
I'm loving ollama.. it was a breeze to run a 7B model locally in my humble laptop.. If you can show us how to fine-tune a model locally and then use it with ollama, that would be awesome..
That's on my list of things to figure out! Do you have any particular thing you'd like to fine tune for?
@@learndatawithmark I have some data which is bullet points of facts as input and a coherent paragraph from the bullet points as output. There are tons of tutorials but it's too chaotic, each one is using different base model quantization, prompt template, output format.. So, it would be great to create a model which can be run wiith ollama.. If you are interested, I can share the data..
@@AlperYilmaz1 Hi sir, i am having same problem. Have you found a way to do it?
How to do it in window?
Thank you very much! After reading the documentation and spending 30 minutes asking GPT-4 (like an idiot) on how to do it, I was confused. Looks like I did something wrong with writing the path. Your video is clear and easy-to-understand .
Glad it helped :)
and what was your fix? it's really bothering me
@@boysrcute To ensure the paths are correctly formatted to avoid parsing errors in scripts or tools that may interpret backslashes as escape characters, you should use double backslashes (\\) or replace them with forward slashes (/). You'll want to fix your modelfile path to have forward slashes.
Thanks for the video!
what about the prompt template in the model file?
I didn't bother doing anything with that, but there are a lot more options for refining that than there were at the time I created the video. You can see all the options here - github.com/ollama/ollama/blob/main/docs/modelfile.md
Great vid bro, you're very underrated
Thanks, appreciate it!
thanks a lot, ollama¡¡
Great video, Thank you!
Pro tip - use pretty much anything else EXCEPT Ollama, because Ollama demands you only use their version of GGUF files. Every other such software uses normal GGUF you can download yourself. Don't get trapped inside Ollama.
You know you can use any model from HF in Ollama right?
@@EcoTekTermika Only if you mess around creating model files for it and don't mind it being given a Sha hash as a name... which is my point, it's just a HF GGUF, but with extra steps that stop you using the models for anything else
Yeh it's frustrating - would be way better if you could use the HF models directly and they had a separate file for any of the meta data they create.
Main reason I end up using Ollama a lot of the time is that I can't find a reliable place to get quantised versions of the models. I use to download them from TheBloke, but he stopped doing them since about January!
Thank you for sharing information 🙂
Is it possible to use Ollama SHA256:... files like to gguf / bin, etc to use it also in eg LM Studio / Autogen, etc or these files are useless outside Ollama due to the fact that is used hashing on them (if so) ?
That is a good question and I don't know the answer right now. Need to take a look at the Ollama code to see exactly what those files contain!
thanks. what about lfs models? (many models don't have gguf model files?) do i skip something here?
I'm not sure what a lfs model is? Also you don't have to use Ollama to run models - there's always Hugging Face's transformers library which works with all their models too.
Is there no way to run a .gguf file that i already have downloaded? if not i guess ill have to stick with LMstudio & TGwebui
Nope, AFAIK you can't run gguf directly, you always have to convert it to Ollama's format. Other tools for running GGUF files directly are llama.cpp or llamafile in case you haven't heard of them!
wouldnt it be nice to make a model that fuses them all In one master AI?
What is the asitop alternative for Linux?
asitop says it's the alternative for nvtop, so perhaps one of those functions?
"A Python-based nvtop-inspired command line tool for Apple Silicon (aka M1) Macs."
There must be a new and easy method currently to run GGUF models in Ollama or in Open WebUI. Please update this method. 🎉❤
As far as I know, this is still the way to run GGUF models with Ollama. I wish you could use GGUF files directly, it would be so much easier!
I haven't used Open WebUI, I'll take a look at that.
If you want command line tools that can run GGUF files directly, take a look at llamafile or llama.cpp
github.com/ggerganov/llama.cpp
github.com/Mozilla-Ocho/llamafile
@@learndatawithmark Thanks for answering. Indeed, my interest is in running them on Ollama, due to the new Open WebUI, which is the most marvelous thing invented. Open WebUI is a frontend to Ollama, presenting an interface like Chatgpt, with history of the conversations, talks with LLMs completely hands free, you talk and listen, and you can input your texts, PDFs, RAG, makes LLMs access internet in real time, upload images, multimodality, it's fantastic. You definitely need to test it. The problem with it is that it's based on Ollama, so, I use LM Studio, with dozens of Hugging face models, and I love them. I would like these models to be used in Open WebUI, but they are in GGUF format, that's why I found your video, in order to use gguf models in Ollama. 🙏👍💥
I don't understand, Ive donwnloaded Ollama, run the first model, then hit : Ollama run dolphin-mixtral:latest
But it was too slow, I don't understand all the part you went too, you used huggingface but I just want to run the model and install it I don't know anything about that poetry hugging face part
If you only want to use one of Ollama's built in libraries you don't need to do any of the stuff in this video - you can do what you said. But keep in mind that dolphin-mixtral is one of the biggest models so it will be slower than the other ones. Perhaps try dolphin-mistral to see if that gives better performance.
@@learndatawithmark I will try to figure everything out on how to to all this thanks
Why all the complication instead of just using curl to get the GGUF from the website you were already on?
Good question! I have used cURL a few times, but in the instructions it suggested that you should use the CLI tool. I haven't actually looked at the code to see if/what it does differently to cURL.
A clear example that shows that most nerds lack the skills and the willingness to provide simple solutions for non-nerds, the majority of us humans.
It takes a less than a minute to install LM Studio and an AI model and have your code checked, a role played, questions answered, etc in private. Why are guys like you often ignore the regular users, and only focus on fellow nerds?
Hey - Thanks for your feedback. You're right - LM Studio is an easier approach, but I didn't know that it supported GGUF until I read your post.
This video was also more about showing how you can use Ollama to run models even if the Ollama folks haven't already added it as a library.
To be fair they do now seem to add new models so quickly that you rarely have that situation.
no offense but you sound like jacksepticeye
I have no idea who that is! Should I be offended?!
wtf poetry? noooooo why?
What should I use instead?!
@@learndatawithmark Anaconda is my standard, but maybe I should learn poetry. I think I'm just frustrated that so many tutorials assume familiarity with so many tech stack options.
When I try this I do get the newly added model listed, but when I run it it fails with an error Error: Post "127.0.0.1:11434/api/generate": EOF
I have this problem on the the safetensor-coverted-to-GGUF model that I imported into ollama. Other much larger models like 34B Llava run fine. For conversion to GGUF I used ruSauron/to-gguf-bat method on Github. Any ideas where this went wrong?
Thanks
Is there anything in the Ollama log file? ~/.ollama/logs/server.log
It might also be worth seeing whether you can use the safetensors model directly. I showed how to that here - th-cam.com/video/DSLwboFJJK4/w-d-xo.html
Equally it might just be that the model isn't supported by llama.cpp, which is the underlying library that Ollama uses to run inference on the LLMs.
@@learndatawithmark Thanks for the response. I did see your other video on using safetensors in ollama but ran into a roadblock right at the start 😊. I posted a message on that video too with a question on where to find “template” info but for some reason the message doesn’t appear. Template isn’t mentioned on the model page and I am at wits end trying to find it. I skipped template info and needless to say it didn’t work.
I just retried posting a message on the other video hopefully it registers it this time. :))
@@learndatawithmark I don't see a logs folder in .ollama. All that's in there is history id_ed25519 id_ed25519.pub
I'm looking in the root folder. Is it elsewhere?
Thanks
I tried another conversion method using llama.cpp and on the last step this is what I got:
INFO:hf-to-gguf:Loading model: GOT-OCR2_0
ERROR:hf-to-gguf:Model GOTQwenForCausalLM is not supported
You were right, perhaps not every safetensor can be converted to gguf.