Thanks, Matt. Your Ollama course is great because it's easy to follow and addresses problems from your unique point of view. Always upvoted! Regarding naming, what you call "source" models are also referred to as "foundation" or "pretrained" models, as far as I know. It's a good distinction between chat-fine-tuned models (sometimes called chat-completion models) and instruct-fine-tuned models (sometimes called text-completion models). In general, custom fine-tuning a model involves taking a source model and refining it with custom data. This feature is not currently supported by the present version of Ollama, even though you’ve rightly dedicated one or two videos to how to create a custom fine-tuned model by training an original source model. Regarding multimodal models, as you mentioned, Ollama includes some vision LLMs (image input) like LLava and others, I believe. You correctly pointed out that multimodal could also involve audio input (and output), which seems feasible at the moment (I’ll need to double-check by example the new released Mistral Pixtral when available on ollama). BTW I think video processing using Ollama is also of great interest, so it might be worth exploring this topic in future videos. Just my two cents-thanks again!
Thanks Matt, great video and series! Why don’t LLMs always produce good embeddings? And why do embedding models sometimes underperform in RAG applications? I’ve tested many models, but only five have consistently provided accurate embeddings for paper abstracts, verified by clustering and ground truth.
Hi sir your videos are great and very informative and I really like them but could you please explain some of the concepts by sitting before a pc and show them practically, I am really confused what model to download, the benchmarks show good results and when I really use them they are worse and also there are different quantisations like q4,q6,q8,fp16,K_S,K_M,etc which are difficult to understand. Thanks for reading the comment
Question - with Ollama now supporting the GUFF file format from Huggingface, can you run the video models locally with Ollama? Have not tried it yet...
Ollama has supported the gguf file format since the day it was created. There was ggml before, but gguf is the only thing ollama can work with. You have been able to download any gguf from huggingface and use it in ollama for at least a year. So not sure what you are asking. Which video model?
@@technovangelist Thanks for the response. Around 5:40 you mentioned that Ollama does not support video yet. Is that an Ollama restriction or the model that was downloaded? Appologies for being unclear.
got it. its mostly a restriction of the available models. There aren't any that do a good job of reviewing a video that are a format that llamacpp or ollama can support. the models I have seen that can do that only support a second or two of video and that would need a lot of memory.
I always think why isn't a model that only talks and can be trained with information (like FAQ Help desk or company internal bot) and of course it's an small one and answers properly without hallucinations
@@tecnopadre there is the so called noun phrase collisions,, which are seemingly a big part in hallucinations, even in rag systems. basically the problem is not inaccurate data but reference nouns that are ambiguous. there are some very interesting articles to google and also some work to eliminate them. basically it can be corrected with the right prompting.
Oh wow! I completely misunderstood what "instruct" models are for and was avoiding them when, in fact, that's what I need. Thank you!
Thanks, Matt. Your Ollama course is great because it's easy to follow and addresses problems from your unique point of view. Always upvoted!
Regarding naming, what you call "source" models are also referred to as "foundation" or "pretrained" models, as far as I know. It's a good distinction between chat-fine-tuned models (sometimes called chat-completion models) and instruct-fine-tuned models (sometimes called text-completion models).
In general, custom fine-tuning a model involves taking a source model and refining it with custom data. This feature is not currently supported by the present version of Ollama, even though you’ve rightly dedicated one or two videos to how to create a custom fine-tuned model by training an original source model.
Regarding multimodal models, as you mentioned, Ollama includes some vision LLMs (image input) like LLava and others, I believe. You correctly pointed out that multimodal could also involve audio input (and output), which seems feasible at the moment (I’ll need to double-check by example the new released Mistral Pixtral when available on ollama). BTW I think video processing using Ollama is also of great interest, so it might be worth exploring this topic in future videos.
Just my two cents-thanks again!
Once more, I learned something. i've asked that question before but never have gotten a satisfactory answer. Thanks, Matt.
Glad to help!
Thank you for keep caring poor learner. Thanks Matt
Excellent thanks from Chile.
Awesome course, thank you! My only request would be to have mentioned the suffix Q_M Q4 etc
Thanks for the explanation 😊
you rock! Thank you!
Thank you, Matt. Great info.
Nice, well done.
Would you regard the NER models as cases of fine-tuned models?
Thanks
Tak!
Thanks Matt, great video and series! Why don’t LLMs always produce good embeddings? And why do embedding models sometimes underperform in RAG applications? I’ve tested many models, but only five have consistently provided accurate embeddings for paper abstracts, verified by clustering and ground truth.
How can I get the fast answers and accurate at the same time using Ollama ?????
Hi sir your videos are great and very informative and I really like them but could you please explain some of the concepts by sitting before a pc and show them practically, I am really confused what model to download, the benchmarks show good results and when I really use them they are worse and also there are different quantisations like q4,q6,q8,fp16,K_S,K_M,etc which are difficult to understand. Thanks for reading the comment
There is another video in the course that shows the quants
Question - with Ollama now supporting the GUFF file format from Huggingface, can you run the video models locally with Ollama? Have not tried it yet...
Ollama has supported the gguf file format since the day it was created. There was ggml before, but gguf is the only thing ollama can work with. You have been able to download any gguf from huggingface and use it in ollama for at least a year. So not sure what you are asking. Which video model?
@@technovangelist Thanks for the response. Around 5:40 you mentioned that Ollama does not support video yet. Is that an Ollama restriction or the model that was downloaded?
Appologies for being unclear.
got it. its mostly a restriction of the available models. There aren't any that do a good job of reviewing a video that are a format that llamacpp or ollama can support. the models I have seen that can do that only support a second or two of video and that would need a lot of memory.
@@technovangelist Ahh - thank you very much! Awesome series - keep up the good work. Learning heaps and bounds.
I always think why isn't a model that only talks and can be trained with information (like FAQ Help desk or company internal bot) and of course it's an small one and answers properly without hallucinations
@@tecnopadre there is the so called noun phrase collisions,, which are seemingly a big part in hallucinations, even in rag systems. basically the problem is not inaccurate data but reference nouns that are ambiguous. there are some very interesting articles to google and also some work to eliminate them. basically it can be corrected with the right prompting.