Importing Open Source Models to Ollama

Decoder

มุมมอง 41 726

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ม.ค. 2025

ความคิดเห็น • 126

@rookandpawn 7 หลายเดือนก่อน ⁺¹⁶
this is one of those videos where I can say "out of all the videos on this subject, you just straight out provide the real documentation and know how". From the modelfile, prompt template, explanation, etc.. Thank you.
@decoder-sh 7 หลายเดือนก่อน
Thank you taking the time to comment! I look forward to making more videos
@OUTLANDAH 6 หลายเดือนก่อน
@@decoder-sh im not that great with using terminals. im on powershell and im a little confused on how to get to the modelfile make section
@mohl-bodell2948 11 หลายเดือนก่อน ⁺⁸
Thanks a lot! I love your carefully prepared, very quick and succinct, yet complete style! This one was a bit over-paced compared to your other two videos so far, but just a tiny bit, also it would have been nice to see the rename at the end. Keeping it succinct and to the point as you do is the big value in your videos.
@decoder-sh 11 หลายเดือนก่อน ⁺¹
Thank you for watching my videos, as well as for your feedback! You’re the second commenter to mention this was a bit too fast, so I will do my best to correct that in the next one :)
@tokyofrostgang8302 3 หลายเดือนก่อน ⁺¹
6:04 - talks about quantization (how you can, using a docker command quantize your GGUF file to a smaller size/bit format... well, I think they added that feature on huggingface recently. When I went to the model mentioned in the video, well, TheBloke's version of it, there's a button that displays next the the [Train] button, it's a [Use this model] button, when you click on it, you now have a choice "Ollama"... pretty nice! Thanks for this video it was extremely helpful.
@decoder-sh หลายเดือนก่อน
That sounds cool, I'll have to try that out!
@АлександрГоготов-д7ь 4 วันที่ผ่านมา
There are a lot of models which do not have that option.
@MarkSze 11 หลายเดือนก่อน ⁺⁶
Great simple explanations, and so useful
@scottmiller2591 11 หลายเดือนก่อน ⁺⁴
Cool. Looking forward to local RAG when that's ready.
@ironman5034 11 หลายเดือนก่อน ⁺²
Great straightforward informative video, keep it up man...was trying some currently dolphin-mixtral
@joelreyes5583 4 หลายเดือนก่อน
This helped me a lot. Great quality and good way of explaining everything. Thank you so much
@cmwilki 11 หลายเดือนก่อน ⁺²
Unfortunately the docker run command to quantize your own model fails. I've had a heck of a time getting anything ollama convert / quantize-related to work :(
@blackswaneleven 11 หลายเดือนก่อน
Ничего лишнего. Отличная подача материала. Жду новых интересных видео.
@Mulakulu หลายเดือนก่อน
As a newbie to this, you kinda jumped 10 steps here 3:05. Also, I'm on Windows and have no experience using linux. Any documentation on how to do this in windows?
@decoder-sh หลายเดือนก่อน ⁺¹
Good call, sorry about that!
Step 1: Create a new file called "Modelfile" (the name isn't important, you can call it whatever you want)
Step 2: Edit the modelfile (which is what I'm doing at 3:05). If you're not familiar with what a modelfile is or how it works, check out my older view for a refresher th-cam.com/video/xa8pTD16SnM/w-d-xo.html
You can view all of the code I wrote here decoder.sh/videos/importing-open-source-models-to-ollama
I don't have any videos for windows unfortunately, but I believe the ollama cli is the same for all operating systems
@bennguyen1313 11 หลายเดือนก่อน ⁺²
Why are the LLM models so big (25GB)? For example, isn't the model (BLOOM, Meta's Llama 2, Guanaco 65B and 33B, dolphin-2.5-mixtral-8x7b etc) just the algorithm that is used to learn your data?
And if the training data is another 25GB, what is the resulting size if you wanted to run your new AI offline on a new PC? 50GB? And what do the 33B and 8x7b mean?
For example, everyone says that ChatGPT4 has 220 billion parameters and is a 16-way mixture model with eight sets of weights?
@decoder-sh 11 หลายเดือนก่อน ⁺⁴
So a model from a zoomed-out perspective has two components - the model architecture (llama, mistral, mixtral...) which describes the steps and connections that transform an input to an output, and the weights which are the result of training the model.
Another way to think about this is that the model is like a blueprint that tells us which parts of a building go where, how many doors there are, what the plumbing looks like. A blueprint itself takes up no space and weighs nothing. But the building materials, the weights in our model, are what physically occupy the space. Here's a more literal explanation of weights: datascience.stackexchange.com/questions/120764/how-does-an-llm-parameter-relate-to-a-weight-in-a-neural-network
For fast math on how much disk space a model uses, try this calculation: # of parameters * (4 bits if quantized, 32 if not) / (8 bits in one byte).
So the Phi model has 2.7B parameters and is about 1.6GB. Math: 2.7 * 1e9 * 4 (all of ollama's models are quantized afik) / 8 = 1.35gb. Then every model uses some extra space for config files etc.
@bimbim1862 2 หลายเดือนก่อน
FileNotFoundError: spm tokenizer.model not found.
@samadislam4458 5 หลายเดือนก่อน ⁺¹
I can only see the bin file where is the gguf file?
@kamleshpaul414 11 หลายเดือนก่อน ⁺³
Thank you for this video
i have one question to generate gguf do i need any special hardware or can i just generate from google colab
thanks again for this video ❤
@decoder-sh 11 หลายเดือนก่อน ⁺¹
I believe this process does require a GPU, but you should have access to one on Colab
@baheth3elmy16 11 หลายเดือนก่อน ⁺¹
Your video is amazing. I never thought transferring these big models into GGUF was this simple. You just unlocked a lot of possibilities. Thank you so very much! Sadly you don't have many videos posted. Hope you do more videos. I wonder if Docker is the only way to transfer models to GGUF.
@decoder-sh 11 หลายเดือนก่อน
You can also use llama.cpp (which ollama is basically a fancy wrapper for) to do the conversion to gguf!
@bruno10505 8 หลายเดือนก่อน
@@decoder-sh Does it work for tokenizer.json file? Docker seems to only work with the .model one
@decoder-sh 8 หลายเดือนก่อน
@@bruno10505 Unfortunately I'm not sure about that
@fah_mi 11 หลายเดือนก่อน ⁺²
Great content thanks, you definitely deserve more subscribers ! Can you show us how to let the models have access to local data and learn from it in a future video ?
@decoder-sh 11 หลายเดือนก่อน ⁺¹
Thanks for watching! Yes I do intend to do a whole series on interacting with documents in the near future :)
@kachunchau4945 11 หลายเดือนก่อน
Hopefully you guys can make a video about fine tuning or long chaining to make the model more adaptable to our personal use needs
@decoder-sh 11 หลายเดือนก่อน ⁺³
Absolutely! I’m already putting together a script for fine tuning now :)
@Rohambili 3 หลายเดือนก่อน
hi! What to do when i installed ollama with the sh script In linux? after cloned the repo...
@jossushardware1158 2 หลายเดือนก่อน
Great video!! Please Make instructions how to run models which are MllamaForConditionalGen arch.
@davethorn9423 3 หลายเดือนก่อน
Thanks for the video, you kind of skip over the modelfile for the huggingface converted file at the end , how do you determine the prompt template to use ?
@smithnigelw 11 หลายเดือนก่อน ⁺²
Very interesting and useful. I’m interested in the format GGUF, so maybe you can describe that in more detail. I wish Ollama was available for Windows OS.
@AnythingGodamnit 11 หลายเดือนก่อน
Fwiw I run it on windows via docker. I don’t have an nvidia GPU though, so it’s pretty slow. Agree that a native install experience would be nice
@decoder-sh 11 หลายเดือนก่อน
That’s a good idea, I feel like it’s a common enough format to warrant a deep dive or at least a closer look
@sunnywang998 11 หลายเดือนก่อน ⁺¹
gguf is the new format for llama.cpp model image
@JamesMcHie 9 หลายเดือนก่อน
I would love to see a similar tutorial for Windows as I am running Ollama with the openWebUI front end in Windows on an Intel Arc GPU.
@remedyreport 3 หลายเดือนก่อน
thanks1 I'm getting up to speed on all this info. I was wondering where to find these LLM models that Ollama Run didn't know about.
@ScrantonStrangler19 10 หลายเดือนก่อน ⁺¹
Good explanation! Is there a list of model architectures that are supported by Ollama?
@decoder-sh 10 หลายเดือนก่อน
I wasn't able to find one - ollama is llama.cpp under the hood, and the closest thing I was able to find was their list of supported models. Anything that's a finetune of these models should work! github.com/ggerganov/llama.cpp?tab=readme-ov-file#description
@ScrantonStrangler19 10 หลายเดือนก่อน
@@decoder-sh I see, thanks a lot! I'm gonna try some of them out.
@JohnSigvald 3 หลายเดือนก่อน
Wow, thank you for this! :D
@maxlgemeinderat9202 5 หลายเดือนก่อน
how would you import not-quantized models?
@parthwagh3607 6 หลายเดือนก่อน
How to run this in windows, where files are safetensors? Where to create modelfile? I have multiple models on different directory of oobabooga/text-generation-webui, I have to use them in ollama.
@StnImg 11 หลายเดือนก่อน ⁺¹
Awesome video. You covered every bit of it. Can u make a video on Agentkit codebase with ollama
@decoder-sh 11 หลายเดือนก่อน
I will look into it! Thanks for the suggestion
@dib9900 7 หลายเดือนก่อน
When I want to import an embeddings models yaml modelfile different that for the Chat LLM models?
If a model doesn't have information about supported prompt templates & parameters, where do I get those?
@ACaruso 9 หลายเดือนก่อน ⁺¹
great explanation and video format. do you know how to use models pulled with ollama (i.e. $ ollama pull dolphin-mixtral) as gguf files? is there a way to convert those to .gguf? thanks
!
@decoder-sh 9 หลายเดือนก่อน
After poking around the ollama repo, it does appear that models are stored as ggufs
github.com/ollama/ollama/blob/main/server/images.go#L696
github.com/ollama/ollama/blob/main/server/images.go#L401
@SODKGB 9 หลายเดือนก่อน
I would like to ingest several of my own documents and perhaps add it into an existing gguf? Not sure what is the best way to add documents to make them searchable, while using a windows version of Ollama and Docker. Any tips would be great, thanks. I want to avoid the one at a time concept and the need to use the local interface, ideally it would be great to dump the files into a directory and run the ingester.
@张立昌-o4l 5 หลายเดือนก่อน
I have downed ollama, stroed it in my computer, but cannont open it. Why? How to deal with this?
@davethorn9423 3 หลายเดือนก่อน
Check you've got it installed Ok using a CMD line command like 'ollama list' , you should see a message saying model list empty. Then run 'ollama serve'
@DihelsonMendonca 6 หลายเดือนก่อน
💥That´s wonderful. I´m not a programmer, don´t know Python, but I could install Open WebUI, nd it has only Ollama models, and I love those Hugging Face GGUF models. So I need a way to run them on Open WebUI. Thanks ! ❤❤❤
@squiddymute 11 หลายเดือนก่อน
sorry for bothering again , i'm using the ollama api in python to create a chat request with 1 message but i found if i create another request the context from the same request appears to have changed . I'm trying to parse the output from the first request make some decisions on it then ask another question but in the context of the 1st message. I tried using generate instead of chat but it seems that it doesn't support images list parameter
@decoder-sh 11 หลายเดือนก่อน
What do you mean by the context? For the chat endpoint, you'll need to append the llm response to the list of messages you're sending in your second request, see here for more info github.com/ollama/ollama/blob/main/docs/api.md#chat-request-with-history
@pavansatish3506 7 หลายเดือนก่อน
What’s your system configuration? Ram and M1?
@decoder-sh 7 หลายเดือนก่อน
32gb ram, M1 chip
@mishlaev 11 หลายเดือนก่อน
Thanks for the video. Why don’t you use LM Studio instead of Olama?
@decoder-sh 11 หลายเดือนก่อน ⁺²
Tbh I haven’t tried it yet! One of the videos I’d like to do in the near future is comparing different ways of running local models, with or without a UI
@squiddymute 11 หลายเดือนก่อน
i'm fiddling with the llama2 model i find it impossible for it to produce short descriptions from big ones :/ it keeps shoving out huge chunks of text no matter what i tell it , is there a hack to reduce the word output count somehow ?
@decoder-sh 11 หลายเดือนก่อน ⁺¹
Maybe try modifying the system prompt to include something like "Your responses should be as concise as possible, no longer than 2 sentences."
I've also found that adding examples to the system prompt helps a lot, eg "here's an example exchange: 'user: count to 3; assistant: 1 2 3' "
@creativemind2506 11 หลายเดือนก่อน
Hey can you please let me know about hollyland mic and would be great if you can share the link
@decoder-sh 11 หลายเดือนก่อน
Hey yeah! It's the Lark Max, I've really been enjoying using it a.co/d/0RBC5XQ
@drmetroyt 11 หลายเดือนก่อน
I have a confusion , how to write modelfile for every llm i import into ollama ? Need some tutorial on various parameters, template and other things in model file
@decoder-sh 10 หลายเดือนก่อน ⁺¹
This is a great idea, I'll add this to my list! I would be happy to walk through how to fully customize an ollama modelfile
@drmetroyt 10 หลายเดือนก่อน ⁺¹
@@decoder-sh thanks waiting for it
@JohnSmith-iv5zy 8 หลายเดือนก่อน
@@drmetroyt did we get it yet lol
@khlifihamza1368 8 หลายเดือนก่อน
Thanks a lot! i only have one question, regarding creating the ollama model based on the GGUF, it worked perfect with the suggested template, but the second option does not, why is that? and can you provide the modelfile used for the second method please? Modelfile.txt:
FROM "CapybaraHermes-2.5-Mistral-7B"
PARAMETER stop ""
PARAMETER stop ""
TEMPLATE """
system
{{ .System }}
user
{{ .Prompt}}
assistant
"""
@_MoshikoAz_ 11 หลายเดือนก่อน
do you find it a better replacement to chat gpt ? (specifically gpt-4)
@decoder-sh 11 หลายเดือนก่อน
Gpt4 is currently the gold standard for LLMs by quality. In fact a lot of models are trained on data generated by gpt4, that should tell us how good people think it is. But while gpt4 is very good at most things, we can train small models that we’re able to run locally to be good at specific things. I’ll be doing a video on this process called fine tuning in the near future
@thomasshields1827 10 หลายเดือนก่อน
Really nice video!
@decoder-sh 10 หลายเดือนก่อน
Thank you very much!
@excido7107 9 หลายเดือนก่อน
Great video man! Keep up the good work!
I have a question, my docker isn't working on my Windows due to some WSL issue I think but I've got Ollama running without docker and was wondering if its still possible to quantize a model with Ollama?
@parthwagh3607 6 หลายเดือนก่อน
please reply if you found the way to run in windows
@excido7107 6 หลายเดือนก่อน
@@parthwagh3607 Sorry I haven't, as in Ollama works fine in windows, but not importing open source models
@parthwagh3607 6 หลายเดือนก่อน
@@excido7107 if we have downloaded the models for obabooga and want to use in ollama.
@kelvinli2970 8 หลายเดือนก่อน
How to do it in window?
@parthwagh3607 6 หลายเดือนก่อน
please reply if you found the way to run in windows
@user-he8qc4mr4i 9 หลายเดือนก่อน
Nice one! thx for sharing!
@decoder-sh 9 หลายเดือนก่อน
My pleasure!
@olavonaomorreu 6 หลายเดือนก่อน
Hey man!!! Thankx
You're the man
You're the one who wakes the rooster up
You don't wear a watch, you decide what time it is
When you misspell a word, the dictionary updates
You install Windows, and Microsoft agrees to your terms
When you found the lamp, you gave the genie three wishes
When you were born, you slapped the doctor
The revolver sleeps under your pillow
You ask the police for their documents
When you turned 18, your parents moved out
Ghosts gather around a campfire to tell stories about you
hugs for brazil
@decoder-sh 6 หลายเดือนก่อน
Wow no one has ever written me lore before! I hope to live up to your impression of me 🫡
@olavonaomorreu 6 หลายเดือนก่อน
@@decoder-sh No need to try hard
you already saved my life from an Indian villain who was holding me for more than 6 hours in a suicidal tutorial
When you come to Brazil, you already have a house to stay in
@decoder-sh 6 หลายเดือนก่อน ⁺¹
Then it sounds like it's time to take this show on the road 😎
@olavonaomorreu 6 หลายเดือนก่อน
@@decoder-sh 😎😎😎😎
@proterotype 8 หลายเดือนก่อน
Rewatching this for the non-GGUF repo section. That would’ve been tricky without you
@decoder-sh 8 หลายเดือนก่อน ⁺¹
Glad to be of use!
@chethankumarda9883 9 หลายเดือนก่อน ⁺¹
HOW TO DO IN WINDOWS
@parthwagh3607 6 หลายเดือนก่อน
please reply if you found the way to run in windows
@davethorn9423 3 หลายเดือนก่อน
You could use Windows Subsystem for Linux if you want to use Linux on Windows, otherwise should be very similar, you have Ollama install on Windows, you can use git on Windows,and you have the command line terminal
@crazyKurious 10 หลายเดือนก่อน
Just awesome !
@decoder-sh 10 หลายเดือนก่อน
Thanks for watching!
@SAVONASOTTERRANEASEGRETA 11 หลายเดือนก่อน
but why isn't there a windows version?
@nonenothingnull 11 หลายเดือนก่อน
Easier
@decoder-sh 11 หลายเดือนก่อน
There is now! ollama.com/download/windows
@emil8367 11 หลายเดือนก่อน
super useful. thanks !
@decoder-sh 11 หลายเดือนก่อน
Thanks for watching!
@sebastianarias9790 10 หลายเดือนก่อน
Make a video using a model that analyzes tables and generates new processed tables like csv, excel !
@sebastianarias9790 8 หลายเดือนก่อน
Could this be possible my friend?
@jmpark3 11 หลายเดือนก่อน
I built a model the way you showed me, but the model's response has nothing to do with my question.
@decoder-sh 11 หลายเดือนก่อน
I've noticed this sometimes happens if an unexpected character appears in the modelfile. For example, my text editor sometimes converts " into ”, which is a different character. If that happens, then I get the same issue as you.
@proterotype 11 หลายเดือนก่อน
Great video. New sub
@decoder-sh 11 หลายเดือนก่อน
Love to hear it, thanks for watching!
@carthagely122 8 หลายเดือนก่อน
Very very thank you
@tiredofeverythingnew 11 หลายเดือนก่อน ⁺⁸
Great video but you go over some of the steps really quickly. Slow down on the showing part, this is why we are here to learn.
@decoder-sh 11 หลายเดือนก่อน ⁺⁶
Thank you for the feedback! I’m still refining my pacing, I’ll do my best to improve that in the next one
@crobinso2010 11 หลายเดือนก่อน ⁺⁶
See if changing the speed in settings helps
@feignenthusiasm 11 หลายเดือนก่อน ⁺⁴
I disagree, the pacing of this video was perfect 👌. Thanks so much for cutting out the fluff, showing the important parts but keeping things moving.
@ejh237 11 หลายเดือนก่อน ⁺²
I think he was referring to the “showing part”, being when we are seeing the actions in terminal. I did have to back up so i could look for more than the 1/10th a second one part was on screen. :). All said, great video, and helpful too!
@decoder-sh 11 หลายเดือนก่อน ⁺³
@@ejh237 Noted for my next video! I think I'm going to start doing pop-outs of any commands that I run that stick around until the next command. That way you can see the command even while you're watching the output of that command go by.
@wilcosec 11 หลายเดือนก่อน ⁺¹
Next up, contribute your configs back to ollama so others don’t have to do these steps over again.
@decoder-sh 11 หลายเดือนก่อน ⁺¹
That’s a great idea!
@TomaGamil 11 หลายเดือนก่อน
great
@lowpolyduck 11 หลายเดือนก่อน
OllamAF
@AI-PhotographyGeek 11 หลายเดือนก่อน
I'm newbee but it was hard to grasp what you have done, I belive only expert in this field can imagine or probably while watching this video I have to work hard to understand intermediate steps between your steps shown in the video. Video is intresting but not useful to me.
@decoder-sh 11 หลายเดือนก่อน ⁺³
Hey thanks for your comment. I’d like to make my content friendly for beginners that have a basic ability to use the terminal. Which concepts in particular have you trouble? I hope to use your feedback to improve my future videos
@AI-PhotographyGeek 11 หลายเดือนก่อน ⁺¹
@@decoder-sh Thank you for your reply. 3:07 I didn't understand what is the model file, what is the extension is, where to create it, and where not to create it, should I copy the GUFF file to any folder is okay, making the model file to any location will be acceptable? There were so many questions at that point which led me to stop watching 😅😅
@decoder-sh 11 หลายเดือนก่อน ⁺¹
@@AI-PhotographyGeek⁠oh I see! I have another video that goes into much more detail about model files, please let me know if this clarifies things for you
th-cam.com/video/xa8pTD16SnM/w-d-xo.html
@AI-PhotographyGeek 11 หลายเดือนก่อน ⁺²
@@decoder-sh Definitely, I will refer to that video, but in the future, please capture such steps, there will be a lot of new visitors to your site and they will be watching your video for the first time, if they feel that they need to watch your other videos, just to understand any particular video, then it would be very hard for them to follow you. I hope you grow more on this journey!😊 Not expecting you to explain everything again in detail, but only showing it in a video will help a lot.
@decoder-sh 11 หลายเดือนก่อน ⁺¹
@@AI-PhotographyGeekthat’s a very good idea! I’ll be more explicit about prerequisite knowledge and where to find it. Thank you again for the feedback 🤝

ต่อไป

เล่นอัตโนมัติ