this is one of those videos where I can say "out of all the videos on this subject, you just straight out provide the real documentation and know how". From the modelfile, prompt template, explanation, etc.. Thank you.
Thanks a lot! I love your carefully prepared, very quick and succinct, yet complete style! This one was a bit over-paced compared to your other two videos so far, but just a tiny bit, also it would have been nice to see the rename at the end. Keeping it succinct and to the point as you do is the big value in your videos.
Thank you for watching my videos, as well as for your feedback! You’re the second commenter to mention this was a bit too fast, so I will do my best to correct that in the next one :)
6:04 - talks about quantization (how you can, using a docker command quantize your GGUF file to a smaller size/bit format... well, I think they added that feature on huggingface recently. When I went to the model mentioned in the video, well, TheBloke's version of it, there's a button that displays next the the [Train] button, it's a [Use this model] button, when you click on it, you now have a choice "Ollama"... pretty nice! Thanks for this video it was extremely helpful.
Unfortunately the docker run command to quantize your own model fails. I've had a heck of a time getting anything ollama convert / quantize-related to work :(
As a newbie to this, you kinda jumped 10 steps here 3:05. Also, I'm on Windows and have no experience using linux. Any documentation on how to do this in windows?
Good call, sorry about that! Step 1: Create a new file called "Modelfile" (the name isn't important, you can call it whatever you want) Step 2: Edit the modelfile (which is what I'm doing at 3:05). If you're not familiar with what a modelfile is or how it works, check out my older view for a refresher th-cam.com/video/xa8pTD16SnM/w-d-xo.html You can view all of the code I wrote here decoder.sh/videos/importing-open-source-models-to-ollama I don't have any videos for windows unfortunately, but I believe the ollama cli is the same for all operating systems
Why are the LLM models so big (25GB)? For example, isn't the model (BLOOM, Meta's Llama 2, Guanaco 65B and 33B, dolphin-2.5-mixtral-8x7b etc) just the algorithm that is used to learn your data? And if the training data is another 25GB, what is the resulting size if you wanted to run your new AI offline on a new PC? 50GB? And what do the 33B and 8x7b mean? For example, everyone says that ChatGPT4 has 220 billion parameters and is a 16-way mixture model with eight sets of weights?
So a model from a zoomed-out perspective has two components - the model architecture (llama, mistral, mixtral...) which describes the steps and connections that transform an input to an output, and the weights which are the result of training the model. Another way to think about this is that the model is like a blueprint that tells us which parts of a building go where, how many doors there are, what the plumbing looks like. A blueprint itself takes up no space and weighs nothing. But the building materials, the weights in our model, are what physically occupy the space. Here's a more literal explanation of weights: datascience.stackexchange.com/questions/120764/how-does-an-llm-parameter-relate-to-a-weight-in-a-neural-network For fast math on how much disk space a model uses, try this calculation: # of parameters * (4 bits if quantized, 32 if not) / (8 bits in one byte). So the Phi model has 2.7B parameters and is about 1.6GB. Math: 2.7 * 1e9 * 4 (all of ollama's models are quantized afik) / 8 = 1.35gb. Then every model uses some extra space for config files etc.
Thank you for this video i have one question to generate gguf do i need any special hardware or can i just generate from google colab thanks again for this video ❤
Your video is amazing. I never thought transferring these big models into GGUF was this simple. You just unlocked a lot of possibilities. Thank you so very much! Sadly you don't have many videos posted. Hope you do more videos. I wonder if Docker is the only way to transfer models to GGUF.
Great content thanks, you definitely deserve more subscribers ! Can you show us how to let the models have access to local data and learn from it in a future video ?
Thanks for the video, you kind of skip over the modelfile for the huggingface converted file at the end , how do you determine the prompt template to use ?
Very interesting and useful. I’m interested in the format GGUF, so maybe you can describe that in more detail. I wish Ollama was available for Windows OS.
I wasn't able to find one - ollama is llama.cpp under the hood, and the closest thing I was able to find was their list of supported models. Anything that's a finetune of these models should work! github.com/ggerganov/llama.cpp?tab=readme-ov-file#description
How to run this in windows, where files are safetensors? Where to create modelfile? I have multiple models on different directory of oobabooga/text-generation-webui, I have to use them in ollama.
When I want to import an embeddings models yaml modelfile different that for the Chat LLM models? If a model doesn't have information about supported prompt templates & parameters, where do I get those?
great explanation and video format. do you know how to use models pulled with ollama (i.e. $ ollama pull dolphin-mixtral) as gguf files? is there a way to convert those to .gguf? thanks !
After poking around the ollama repo, it does appear that models are stored as ggufs github.com/ollama/ollama/blob/main/server/images.go#L696 github.com/ollama/ollama/blob/main/server/images.go#L401
I would like to ingest several of my own documents and perhaps add it into an existing gguf? Not sure what is the best way to add documents to make them searchable, while using a windows version of Ollama and Docker. Any tips would be great, thanks. I want to avoid the one at a time concept and the need to use the local interface, ideally it would be great to dump the files into a directory and run the ingester.
Check you've got it installed Ok using a CMD line command like 'ollama list' , you should see a message saying model list empty. Then run 'ollama serve'
💥That´s wonderful. I´m not a programmer, don´t know Python, but I could install Open WebUI, nd it has only Ollama models, and I love those Hugging Face GGUF models. So I need a way to run them on Open WebUI. Thanks ! ❤❤❤
sorry for bothering again , i'm using the ollama api in python to create a chat request with 1 message but i found if i create another request the context from the same request appears to have changed . I'm trying to parse the output from the first request make some decisions on it then ask another question but in the context of the 1st message. I tried using generate instead of chat but it seems that it doesn't support images list parameter
What do you mean by the context? For the chat endpoint, you'll need to append the llm response to the list of messages you're sending in your second request, see here for more info github.com/ollama/ollama/blob/main/docs/api.md#chat-request-with-history
Tbh I haven’t tried it yet! One of the videos I’d like to do in the near future is comparing different ways of running local models, with or without a UI
i'm fiddling with the llama2 model i find it impossible for it to produce short descriptions from big ones :/ it keeps shoving out huge chunks of text no matter what i tell it , is there a hack to reduce the word output count somehow ?
Maybe try modifying the system prompt to include something like "Your responses should be as concise as possible, no longer than 2 sentences." I've also found that adding examples to the system prompt helps a lot, eg "here's an example exchange: 'user: count to 3; assistant: 1 2 3' "
I have a confusion , how to write modelfile for every llm i import into ollama ? Need some tutorial on various parameters, template and other things in model file
Thanks a lot! i only have one question, regarding creating the ollama model based on the GGUF, it worked perfect with the suggested template, but the second option does not, why is that? and can you provide the modelfile used for the second method please? Modelfile.txt: FROM "CapybaraHermes-2.5-Mistral-7B" PARAMETER stop "" PARAMETER stop "" TEMPLATE """ system {{ .System }} user {{ .Prompt}} assistant """
Gpt4 is currently the gold standard for LLMs by quality. In fact a lot of models are trained on data generated by gpt4, that should tell us how good people think it is. But while gpt4 is very good at most things, we can train small models that we’re able to run locally to be good at specific things. I’ll be doing a video on this process called fine tuning in the near future
Great video man! Keep up the good work! I have a question, my docker isn't working on my Windows due to some WSL issue I think but I've got Ollama running without docker and was wondering if its still possible to quantize a model with Ollama?
Hey man!!! Thankx You're the man You're the one who wakes the rooster up You don't wear a watch, you decide what time it is When you misspell a word, the dictionary updates You install Windows, and Microsoft agrees to your terms When you found the lamp, you gave the genie three wishes When you were born, you slapped the doctor The revolver sleeps under your pillow You ask the police for their documents When you turned 18, your parents moved out Ghosts gather around a campfire to tell stories about you hugs for brazil
@@decoder-sh No need to try hard you already saved my life from an Indian villain who was holding me for more than 6 hours in a suicidal tutorial When you come to Brazil, you already have a house to stay in
You could use Windows Subsystem for Linux if you want to use Linux on Windows, otherwise should be very similar, you have Ollama install on Windows, you can use git on Windows,and you have the command line terminal
I've noticed this sometimes happens if an unexpected character appears in the modelfile. For example, my text editor sometimes converts " into ”, which is a different character. If that happens, then I get the same issue as you.
I think he was referring to the “showing part”, being when we are seeing the actions in terminal. I did have to back up so i could look for more than the 1/10th a second one part was on screen. :). All said, great video, and helpful too!
@@ejh237 Noted for my next video! I think I'm going to start doing pop-outs of any commands that I run that stick around until the next command. That way you can see the command even while you're watching the output of that command go by.
I'm newbee but it was hard to grasp what you have done, I belive only expert in this field can imagine or probably while watching this video I have to work hard to understand intermediate steps between your steps shown in the video. Video is intresting but not useful to me.
Hey thanks for your comment. I’d like to make my content friendly for beginners that have a basic ability to use the terminal. Which concepts in particular have you trouble? I hope to use your feedback to improve my future videos
@@decoder-sh Thank you for your reply. 3:07 I didn't understand what is the model file, what is the extension is, where to create it, and where not to create it, should I copy the GUFF file to any folder is okay, making the model file to any location will be acceptable? There were so many questions at that point which led me to stop watching 😅😅
@@AI-PhotographyGeekoh I see! I have another video that goes into much more detail about model files, please let me know if this clarifies things for you th-cam.com/video/xa8pTD16SnM/w-d-xo.html
@@decoder-sh Definitely, I will refer to that video, but in the future, please capture such steps, there will be a lot of new visitors to your site and they will be watching your video for the first time, if they feel that they need to watch your other videos, just to understand any particular video, then it would be very hard for them to follow you. I hope you grow more on this journey!😊 Not expecting you to explain everything again in detail, but only showing it in a video will help a lot.
@@AI-PhotographyGeekthat’s a very good idea! I’ll be more explicit about prerequisite knowledge and where to find it. Thank you again for the feedback 🤝
this is one of those videos where I can say "out of all the videos on this subject, you just straight out provide the real documentation and know how". From the modelfile, prompt template, explanation, etc.. Thank you.
Thank you taking the time to comment! I look forward to making more videos
@@decoder-sh im not that great with using terminals. im on powershell and im a little confused on how to get to the modelfile make section
Thanks a lot! I love your carefully prepared, very quick and succinct, yet complete style! This one was a bit over-paced compared to your other two videos so far, but just a tiny bit, also it would have been nice to see the rename at the end. Keeping it succinct and to the point as you do is the big value in your videos.
Thank you for watching my videos, as well as for your feedback! You’re the second commenter to mention this was a bit too fast, so I will do my best to correct that in the next one :)
6:04 - talks about quantization (how you can, using a docker command quantize your GGUF file to a smaller size/bit format... well, I think they added that feature on huggingface recently. When I went to the model mentioned in the video, well, TheBloke's version of it, there's a button that displays next the the [Train] button, it's a [Use this model] button, when you click on it, you now have a choice "Ollama"... pretty nice! Thanks for this video it was extremely helpful.
That sounds cool, I'll have to try that out!
There are a lot of models which do not have that option.
Great simple explanations, and so useful
Cool. Looking forward to local RAG when that's ready.
Great straightforward informative video, keep it up man...was trying some currently dolphin-mixtral
This helped me a lot. Great quality and good way of explaining everything. Thank you so much
Unfortunately the docker run command to quantize your own model fails. I've had a heck of a time getting anything ollama convert / quantize-related to work :(
Ничего лишнего. Отличная подача материала. Жду новых интересных видео.
As a newbie to this, you kinda jumped 10 steps here 3:05. Also, I'm on Windows and have no experience using linux. Any documentation on how to do this in windows?
Good call, sorry about that!
Step 1: Create a new file called "Modelfile" (the name isn't important, you can call it whatever you want)
Step 2: Edit the modelfile (which is what I'm doing at 3:05). If you're not familiar with what a modelfile is or how it works, check out my older view for a refresher th-cam.com/video/xa8pTD16SnM/w-d-xo.html
You can view all of the code I wrote here decoder.sh/videos/importing-open-source-models-to-ollama
I don't have any videos for windows unfortunately, but I believe the ollama cli is the same for all operating systems
Why are the LLM models so big (25GB)? For example, isn't the model (BLOOM, Meta's Llama 2, Guanaco 65B and 33B, dolphin-2.5-mixtral-8x7b etc) just the algorithm that is used to learn your data?
And if the training data is another 25GB, what is the resulting size if you wanted to run your new AI offline on a new PC? 50GB? And what do the 33B and 8x7b mean?
For example, everyone says that ChatGPT4 has 220 billion parameters and is a 16-way mixture model with eight sets of weights?
So a model from a zoomed-out perspective has two components - the model architecture (llama, mistral, mixtral...) which describes the steps and connections that transform an input to an output, and the weights which are the result of training the model.
Another way to think about this is that the model is like a blueprint that tells us which parts of a building go where, how many doors there are, what the plumbing looks like. A blueprint itself takes up no space and weighs nothing. But the building materials, the weights in our model, are what physically occupy the space. Here's a more literal explanation of weights: datascience.stackexchange.com/questions/120764/how-does-an-llm-parameter-relate-to-a-weight-in-a-neural-network
For fast math on how much disk space a model uses, try this calculation: # of parameters * (4 bits if quantized, 32 if not) / (8 bits in one byte).
So the Phi model has 2.7B parameters and is about 1.6GB. Math: 2.7 * 1e9 * 4 (all of ollama's models are quantized afik) / 8 = 1.35gb. Then every model uses some extra space for config files etc.
FileNotFoundError: spm tokenizer.model not found.
I can only see the bin file where is the gguf file?
Thank you for this video
i have one question to generate gguf do i need any special hardware or can i just generate from google colab
thanks again for this video ❤
I believe this process does require a GPU, but you should have access to one on Colab
Your video is amazing. I never thought transferring these big models into GGUF was this simple. You just unlocked a lot of possibilities. Thank you so very much! Sadly you don't have many videos posted. Hope you do more videos. I wonder if Docker is the only way to transfer models to GGUF.
You can also use llama.cpp (which ollama is basically a fancy wrapper for) to do the conversion to gguf!
@@decoder-sh Does it work for tokenizer.json file? Docker seems to only work with the .model one
@@bruno10505 Unfortunately I'm not sure about that
Great content thanks, you definitely deserve more subscribers ! Can you show us how to let the models have access to local data and learn from it in a future video ?
Thanks for watching! Yes I do intend to do a whole series on interacting with documents in the near future :)
Hopefully you guys can make a video about fine tuning or long chaining to make the model more adaptable to our personal use needs
Absolutely! I’m already putting together a script for fine tuning now :)
hi! What to do when i installed ollama with the sh script In linux? after cloned the repo...
Great video!! Please Make instructions how to run models which are MllamaForConditionalGen arch.
Thanks for the video, you kind of skip over the modelfile for the huggingface converted file at the end , how do you determine the prompt template to use ?
Very interesting and useful. I’m interested in the format GGUF, so maybe you can describe that in more detail. I wish Ollama was available for Windows OS.
Fwiw I run it on windows via docker. I don’t have an nvidia GPU though, so it’s pretty slow. Agree that a native install experience would be nice
That’s a good idea, I feel like it’s a common enough format to warrant a deep dive or at least a closer look
gguf is the new format for llama.cpp model image
I would love to see a similar tutorial for Windows as I am running Ollama with the openWebUI front end in Windows on an Intel Arc GPU.
thanks1 I'm getting up to speed on all this info. I was wondering where to find these LLM models that Ollama Run didn't know about.
Good explanation! Is there a list of model architectures that are supported by Ollama?
I wasn't able to find one - ollama is llama.cpp under the hood, and the closest thing I was able to find was their list of supported models. Anything that's a finetune of these models should work! github.com/ggerganov/llama.cpp?tab=readme-ov-file#description
@@decoder-sh I see, thanks a lot! I'm gonna try some of them out.
Wow, thank you for this! :D
how would you import not-quantized models?
How to run this in windows, where files are safetensors? Where to create modelfile? I have multiple models on different directory of oobabooga/text-generation-webui, I have to use them in ollama.
Awesome video. You covered every bit of it. Can u make a video on Agentkit codebase with ollama
I will look into it! Thanks for the suggestion
When I want to import an embeddings models yaml modelfile different that for the Chat LLM models?
If a model doesn't have information about supported prompt templates & parameters, where do I get those?
great explanation and video format. do you know how to use models pulled with ollama (i.e. $ ollama pull dolphin-mixtral) as gguf files? is there a way to convert those to .gguf? thanks
!
After poking around the ollama repo, it does appear that models are stored as ggufs
github.com/ollama/ollama/blob/main/server/images.go#L696
github.com/ollama/ollama/blob/main/server/images.go#L401
I would like to ingest several of my own documents and perhaps add it into an existing gguf? Not sure what is the best way to add documents to make them searchable, while using a windows version of Ollama and Docker. Any tips would be great, thanks. I want to avoid the one at a time concept and the need to use the local interface, ideally it would be great to dump the files into a directory and run the ingester.
I have downed ollama, stroed it in my computer, but cannont open it. Why? How to deal with this?
Check you've got it installed Ok using a CMD line command like 'ollama list' , you should see a message saying model list empty. Then run 'ollama serve'
💥That´s wonderful. I´m not a programmer, don´t know Python, but I could install Open WebUI, nd it has only Ollama models, and I love those Hugging Face GGUF models. So I need a way to run them on Open WebUI. Thanks ! ❤❤❤
sorry for bothering again , i'm using the ollama api in python to create a chat request with 1 message but i found if i create another request the context from the same request appears to have changed . I'm trying to parse the output from the first request make some decisions on it then ask another question but in the context of the 1st message. I tried using generate instead of chat but it seems that it doesn't support images list parameter
What do you mean by the context? For the chat endpoint, you'll need to append the llm response to the list of messages you're sending in your second request, see here for more info github.com/ollama/ollama/blob/main/docs/api.md#chat-request-with-history
What’s your system configuration? Ram and M1?
32gb ram, M1 chip
Thanks for the video. Why don’t you use LM Studio instead of Olama?
Tbh I haven’t tried it yet! One of the videos I’d like to do in the near future is comparing different ways of running local models, with or without a UI
i'm fiddling with the llama2 model i find it impossible for it to produce short descriptions from big ones :/ it keeps shoving out huge chunks of text no matter what i tell it , is there a hack to reduce the word output count somehow ?
Maybe try modifying the system prompt to include something like "Your responses should be as concise as possible, no longer than 2 sentences."
I've also found that adding examples to the system prompt helps a lot, eg "here's an example exchange: 'user: count to 3; assistant: 1 2 3' "
Hey can you please let me know about hollyland mic and would be great if you can share the link
Hey yeah! It's the Lark Max, I've really been enjoying using it a.co/d/0RBC5XQ
I have a confusion , how to write modelfile for every llm i import into ollama ? Need some tutorial on various parameters, template and other things in model file
This is a great idea, I'll add this to my list! I would be happy to walk through how to fully customize an ollama modelfile
@@decoder-sh thanks waiting for it
@@drmetroyt did we get it yet lol
Thanks a lot! i only have one question, regarding creating the ollama model based on the GGUF, it worked perfect with the suggested template, but the second option does not, why is that? and can you provide the modelfile used for the second method please? Modelfile.txt:
FROM "CapybaraHermes-2.5-Mistral-7B"
PARAMETER stop ""
PARAMETER stop ""
TEMPLATE """
system
{{ .System }}
user
{{ .Prompt}}
assistant
"""
do you find it a better replacement to chat gpt ? (specifically gpt-4)
Gpt4 is currently the gold standard for LLMs by quality. In fact a lot of models are trained on data generated by gpt4, that should tell us how good people think it is. But while gpt4 is very good at most things, we can train small models that we’re able to run locally to be good at specific things. I’ll be doing a video on this process called fine tuning in the near future
Really nice video!
Thank you very much!
Great video man! Keep up the good work!
I have a question, my docker isn't working on my Windows due to some WSL issue I think but I've got Ollama running without docker and was wondering if its still possible to quantize a model with Ollama?
please reply if you found the way to run in windows
@@parthwagh3607 Sorry I haven't, as in Ollama works fine in windows, but not importing open source models
@@excido7107 if we have downloaded the models for obabooga and want to use in ollama.
How to do it in window?
please reply if you found the way to run in windows
Nice one! thx for sharing!
My pleasure!
Hey man!!! Thankx
You're the man
You're the one who wakes the rooster up
You don't wear a watch, you decide what time it is
When you misspell a word, the dictionary updates
You install Windows, and Microsoft agrees to your terms
When you found the lamp, you gave the genie three wishes
When you were born, you slapped the doctor
The revolver sleeps under your pillow
You ask the police for their documents
When you turned 18, your parents moved out
Ghosts gather around a campfire to tell stories about you
hugs for brazil
Wow no one has ever written me lore before! I hope to live up to your impression of me 🫡
@@decoder-sh No need to try hard
you already saved my life from an Indian villain who was holding me for more than 6 hours in a suicidal tutorial
When you come to Brazil, you already have a house to stay in
Then it sounds like it's time to take this show on the road 😎
@@decoder-sh 😎😎😎😎
Rewatching this for the non-GGUF repo section. That would’ve been tricky without you
Glad to be of use!
HOW TO DO IN WINDOWS
please reply if you found the way to run in windows
You could use Windows Subsystem for Linux if you want to use Linux on Windows, otherwise should be very similar, you have Ollama install on Windows, you can use git on Windows,and you have the command line terminal
Just awesome !
Thanks for watching!
but why isn't there a windows version?
Easier
There is now! ollama.com/download/windows
super useful. thanks !
Thanks for watching!
Make a video using a model that analyzes tables and generates new processed tables like csv, excel !
Could this be possible my friend?
I built a model the way you showed me, but the model's response has nothing to do with my question.
I've noticed this sometimes happens if an unexpected character appears in the modelfile. For example, my text editor sometimes converts " into ”, which is a different character. If that happens, then I get the same issue as you.
Great video. New sub
Love to hear it, thanks for watching!
Very very thank you
Great video but you go over some of the steps really quickly. Slow down on the showing part, this is why we are here to learn.
Thank you for the feedback! I’m still refining my pacing, I’ll do my best to improve that in the next one
See if changing the speed in settings helps
I disagree, the pacing of this video was perfect 👌. Thanks so much for cutting out the fluff, showing the important parts but keeping things moving.
I think he was referring to the “showing part”, being when we are seeing the actions in terminal. I did have to back up so i could look for more than the 1/10th a second one part was on screen. :). All said, great video, and helpful too!
@@ejh237 Noted for my next video! I think I'm going to start doing pop-outs of any commands that I run that stick around until the next command. That way you can see the command even while you're watching the output of that command go by.
Next up, contribute your configs back to ollama so others don’t have to do these steps over again.
That’s a great idea!
great
OllamAF
I'm newbee but it was hard to grasp what you have done, I belive only expert in this field can imagine or probably while watching this video I have to work hard to understand intermediate steps between your steps shown in the video. Video is intresting but not useful to me.
Hey thanks for your comment. I’d like to make my content friendly for beginners that have a basic ability to use the terminal. Which concepts in particular have you trouble? I hope to use your feedback to improve my future videos
@@decoder-sh Thank you for your reply. 3:07 I didn't understand what is the model file, what is the extension is, where to create it, and where not to create it, should I copy the GUFF file to any folder is okay, making the model file to any location will be acceptable? There were so many questions at that point which led me to stop watching 😅😅
@@AI-PhotographyGeekoh I see! I have another video that goes into much more detail about model files, please let me know if this clarifies things for you
th-cam.com/video/xa8pTD16SnM/w-d-xo.html
@@decoder-sh Definitely, I will refer to that video, but in the future, please capture such steps, there will be a lot of new visitors to your site and they will be watching your video for the first time, if they feel that they need to watch your other videos, just to understand any particular video, then it would be very hard for them to follow you. I hope you grow more on this journey!😊 Not expecting you to explain everything again in detail, but only showing it in a video will help a lot.
@@AI-PhotographyGeekthat’s a very good idea! I’ll be more explicit about prerequisite knowledge and where to find it. Thank you again for the feedback 🤝