Installing Ollama to Customize My Own LLM

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 มิ.ย. 2024
  • Ollama is the easiest tool to get started running LLMs on your own hardware. In my first video, I explore how to use Ollama to download popular models like Phi and Mistral, chat with them directly in the terminal, use the API to respond to HTTP requests, and finally customize our own model based on Phi to be more fun to talk to.
    Watch my other Ollama videos - • Get Started with Ollama
    Links:
    Code from video - decoder.sh/videos/installing-...
    Ollama - ollama.ai
    Phi Model - ollama.ai/library/phi
    More great LLM content - / @matthew_berman
    Timestamps:
    00:00 - Intro
    00:29 - What is Ollama?
    00:41 - Installation
    00:53 - Using Ollama CLI
    02:06 - Chatting with Phi
    02:41 - Ollama API
    04:36 - Inspecting Phi's Modelfile
    06:27 - Creating our own modelfile
    07:34 - Creating the model
    08:25 - Running our new model
    08:48 - Closing words
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 152

  • @hashmetric
    @hashmetric 4 หลายเดือนก่อน +45

    Perfect. Thank you. Great format. Don’t change a thing. Please don’t become another channel that exists only to tell us “this changes everything,” anything about earning any amount of dollars as a TH-camr, or about using GPT to create mass amounts of crap that will also make us money or a channel that tells us about a new model or paper every day. We don’t need any more of that. Congrats on the first video. More please.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +18

      Not trying to monetize my channel nor lure people in with clickbait titles that the video doesn't pay off 👍 I'm new to content creation so I do intend to explore and experiment with a few things, but please hold me accountable if I ever jump the shark

    • @hashmetric
      @hashmetric 4 หลายเดือนก่อน

      @@decoder-sh but not through Twitter 🤗

  • @proterotype
    @proterotype 4 หลายเดือนก่อน +14

    God, every once in a while you stumble across the perfect TH-cam channel for what you want. This is that channel. Props to you for making difficult things seem easy

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +3

      Thanks for the kind words, I'm looking forward to making more videos! Stick around, "I was gonna make espresso" 😂

  • @rs832
    @rs832 4 หลายเดือนก่อน +9

    Its helpful videos like this that make an instant subscribe and a plunge down the rabbit hole of your content an immediate no-brainer.
    Clear. ✅
    Concise. ✅
    Complete. ✅
    Thanks for providing quality content & for not skipping over the details.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +2

      It's my absolute pleasure to make these videos, thank you for watching!

  • @fontende
    @fontende 4 หลายเดือนก่อน +5

    Algorithm lifts you up in my recommendation waves, congratulation.

  • @RetiredVet
    @RetiredVet 4 หลายเดือนก่อน +2

    In 9 minutes, you gave the best introduction to ollama I have seen. The other videos I have watched were helpful, but you show features such as inspecting and creating models in a short, clearly understood way, that not only tells me how to use ollama, but is also useful info about LLM's I never knew.
    I am retired and looking into AI for fun. In the 60s, my science fair project was a neural network. My father, an engineer, was fascinated with AI and introduced me to the concept. Unfortunately, Marvin Minsky and Seymore Papert wrote Perceptrons and the field slowed down, and I moved on.
    You have a gift for explaining technical concepts. I've enjoyed all three of the current ones and look forward to the next.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thank you for your kind words. I wonder what it must’ve been like to study neural networks in the 60s, only a couple of decades after Von Neumann first conceived of feasible computers. You must’ve been breathing rarified air as even today most people don’t know what a neural network is.
      I read Minsky’s Society of Mind and use it as the basis for my own model of consciousness.
      Thanks again for your comment, and I look forward to making for videos for you soon.

  • @ChrisBrogan
    @ChrisBrogan 4 หลายเดือนก่อน +2

    Really grateful for this. I just downloaded ollama 20 minutes ago, and your 9 minutes has made me a lot smarter. I haven't touched a command line in about a decade.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thanks for watching, I'm heartened to hear you had a good experience! Welcome back to the matrix 😎

  • @MarkSze
    @MarkSze 4 หลายเดือนก่อน +1

    Easy to follow and succinct, thanks!

  • @brunogaliati3999
    @brunogaliati3999 4 หลายเดือนก่อน

    very cool and simplistic tutorial. Keep making videos!

  • @vpd825
    @vpd825 4 หลายเดือนก่อน +2

    Thank you for not wasting my time 🙏🏼 I feel I've gotten so much value per minute spent watching this than a lot of those other popular channels that started out the same but degraded in content quality and initial principals as time went by.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      I appreciate you watching, please continue to keep me honest!

  • @BradSearle4CP
    @BradSearle4CP 4 หลายเดือนก่อน +3

    Good format and style! Very clear. Looking forward to deeper dives!

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      Plenty more to come, thanks for watching!

  • @bernard2735
    @bernard2735 4 หลายเดือนก่อน

    Thank you. I enjoyed your tutorial - well presented and paced and helpful content. Liked and subscribed and looking forward to seeing more.

  • @aimademerich
    @aimademerich 4 หลายเดือนก่อน +3

    Wow you are the only person i have seen cover anything remotely close to this, how to actually use ollama besides downloading models the obvious concept, but you actually open the hood, thank you!!

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Glad you found it useful!

  • @yuedeng-wu2231
    @yuedeng-wu2231 5 หลายเดือนก่อน

    amazing tutorial. very clear and helpful. Thank you!

  • @elcio-dalosto
    @elcio-dalosto 4 หลายเดือนก่อน +2

    Just commenting to rise up the engagement of your channel. What a great content in a so short video. Thank you! I'm playing with ollama and loving it.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      You’re my hero

  • @user-jz2ou2qv2w
    @user-jz2ou2qv2w 2 หลายเดือนก่อน

    This is so clean ..... Great idea and very nice presentation. Funny thing is that my friend and I were talking creating this a week ago. Lol .

  • @grahaml6072
    @grahaml6072 4 หลายเดือนก่อน

    Great job on your first video. Very clear and succinct.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Glad you enjoyed it!

  • @sebastianarias9790
    @sebastianarias9790 2 หลายเดือนก่อน +1

    Great educational content! The simplicity of your process and your explanation makes your channel stand out. Stay true!

    • @decoder-sh
      @decoder-sh  2 หลายเดือนก่อน +1

      I will! ✊ Thanks for tuning in

  • @JaySeeSix
    @JaySeeSix 4 หลายเดือนก่อน

    Logical, clean, appropriately thorough, and not annoying like so many others. A+. Thank you. Subscribed :)

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thanks for subscribing! Plenty more coming soon 🫡

  • @TheColdharbour
    @TheColdharbour 4 หลายเดือนก่อน

    Super!! Total beginner here & Really enjoyed following this and it all worked because of your careful explanation! Looking forward to working through the next ones!

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      Thanks for watching! I look forward to sharing more videos soon

  • @jimlynch9390
    @jimlynch9390 4 หลายเดือนก่อน

    Very good for your first! I don't have a gpu so I keep trying various things to see if I can find something I can use . This has helped, thanks.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thanks for watching! There are a good amount of smaller LLM's like Phi and even smaller, which should be able to run interference on just a CPU. Good luck!

  • @jagadeeshk6652
    @jagadeeshk6652 4 หลายเดือนก่อน

    Great video, thanks for sharing 🎉

  • @kenchang3456
    @kenchang3456 4 หลายเดือนก่อน

    Congrats, great first video.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thank you! Looking forward to making plenty more

  • @Bearistotle_
    @Bearistotle_ 5 หลายเดือนก่อน

    Great tutorial! Saved for future reference.

  • @ipv6tf2
    @ipv6tf2 หลายเดือนก่อน

    missed opportunity to name it `phi-rate`
    love this tutorial! thank you

    • @decoder-sh
      @decoder-sh  หลายเดือนก่อน +1

      Oh man you’re so right!

  • @randomrouting
    @randomrouting 4 หลายเดือนก่อน

    This was great, clear and to the point. Thanks!

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Glad you enjoyed it!

  • @mjackstewart
    @mjackstewart 2 หลายเดือนก่อน

    Great job, hoss! I’ve always wanted to know more about Ollama, and you gave me enough information to be dangerous! Thankya, matey!

    • @decoder-sh
      @decoder-sh  2 หลายเดือนก่อน +1

      Thank you kindly, be sure to use the power responsibly!

  • @computerscientist9980
    @computerscientist9980 4 หลายเดือนก่อน

    Keep Making Videos! SUBSCRIBEDDD!!!

  • @proterotype
    @proterotype 4 หลายเดือนก่อน

    Finally today, after building and setting up a new machine, it was time for me to get off the sidelines and download Ollama and my first model. I had curated some videos from different creators into a playlist. When I went to choose one to guide me through the Ollama setup, yours was the easy choice. For what it’s worth.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      It's worth a whole lot, I'm happy to hear that you find my videos helpful 🙏

  • @user-jo3kt2hv9f
    @user-jo3kt2hv9f 4 หลายเดือนก่อน

    Perfet, Simple, crisp on Topics. Thanks

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thanks for watching!

  • @justpassingbylearning
    @justpassingbylearning 3 หลายเดือนก่อน

    Easily the best channel. thank you for your time and input.

    • @decoder-sh
      @decoder-sh  3 หลายเดือนก่อน +1

      Thank you for watching!

    • @justpassingbylearning
      @justpassingbylearning 3 หลายเดือนก่อน

      Of course! Will be there for what you put out next! I was just telling someone how I found someone who teaches this so easily and articulates in such an understandable way

  • @sh0ndy
    @sh0ndy 2 หลายเดือนก่อน

    No way this is 1st video?? Nice mate, this was awesome. Im subscribing.

    • @decoder-sh
      @decoder-sh  2 หลายเดือนก่อน +1

      Thanks for subscribing! Many more on the way :)

  • @stoicnash
    @stoicnash หลายเดือนก่อน

    Thank you!

  • @user-un6my9sl8g
    @user-un6my9sl8g 4 หลายเดือนก่อน

    Great, thanks.

  • @RustemYeleussinov
    @RustemYeleussinov 4 หลายเดือนก่อน +1

    Thank you for the awesome video! I wish you'd go deeper into "fine-tuning" models but keeping it simple for non-technical folks as you do it in all your videos. I've seen other videos people explain how to "fine-tune" model using cutsom dataset in Python but then no one talks how to use such model in Ollama. I wish you could make such video showing the process end-to-end.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thanks for watching! I do plan on making a video on proper fine-tuning, but in the mean time, please watch this other video of mine on how to use outside models in Ollama! Hugging Face is a great source of fine-tuned models. th-cam.com/video/fnvZJU5Fj3Q/w-d-xo.html

  • @neuralgarden
    @neuralgarden 4 หลายเดือนก่อน

    amazing video

  • @statikk666
    @statikk666 2 หลายเดือนก่อน +1

    Thanks mate, subbed

    • @decoder-sh
      @decoder-sh  2 หลายเดือนก่อน

      Cheers!

  • @eointolster
    @eointolster 4 หลายเดือนก่อน

    Well done man

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thank you!

  • @GeorgeDonnelly
    @GeorgeDonnelly 4 หลายเดือนก่อน

    Subscribed! Thanks!

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      Thank you! More videos coming soon

  • @prashlovessamosa
    @prashlovessamosa 5 หลายเดือนก่อน

    thanks man

  • @theubiquitousanomaly5112
    @theubiquitousanomaly5112 4 หลายเดือนก่อน

    Dude you’re the best.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thanks for watching, dude 🤙🏻

  • @AI-PhotographyGeek
    @AI-PhotographyGeek 4 หลายเดือนก่อน

    Great, easy to understand! 😊 Please continue making such videos, otherwise I may Unsubscribe.😅 😜

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      Don't worry, I intend to! Thanks for watching

  • @baheth3elmy16
    @baheth3elmy16 4 หลายเดือนก่อน

    I am glad I found your channel, I continually search for quality AI channels and don't find a lot around. Thanks for the video and I hope you channel picks up fast. Great content! As for Ollama, I am just not seeing what the hype is about it.. I mean how and why is it different?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thanks for watching all of my videos (so far)! Who are some of your favorite creators in the space?
      As a service, ollama runs LLMs. I agree it's not very differentiated. But it's easy to install, easy to use, and it's got a cute mascot. What's not to like?

    • @baheth3elmy16
      @baheth3elmy16 4 หลายเดือนก่อน

      @@decoder-sh Nothing not to like about it, I guess I like more cosmetic GUIs for example: Everyone praises Comfy, and I just find it intimidating compared to A1111, I hate spiders and their webs and Comfy is a spider web

  • @AIFuzz59
    @AIFuzz59 3 หลายเดือนก่อน

    Is it possible to create a model from scratch? I mean have a blank model and train on txt we provide to it?

  • @philiptwayne
    @philiptwayne 4 หลายเดือนก่อน

    Nice video. In a future video, setting the seed programmatically would be helpful. I'm finding the losing track aspect of smaller models using seed 0 and it seems to me, create is the only way of changing it atm. cheers and well done 👍

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Good call, setting a temperature of 0 should make smaller models more reliable!

  • @lsmpascal
    @lsmpascal 3 หลายเดือนก่อน

    I was waiting for this kind of video.
    Thank you so much.
    So, if I do understand things, we can create Assistants with every models this way, no ?

    • @decoder-sh
      @decoder-sh  3 หลายเดือนก่อน

      Yes, you could use different system prompts to tell models to "specialize" in different things! Another common technique is to use an entirely different model that was trained on specialized data as different assistants. For example, some models are trained to specialize in math, others in medicine, others in function calling - you could route a task to a different model based on their specialty.

  • @JimLloyd1
    @JimLloyd1 4 หลายเดือนก่อน

    Good first vid. In case this gives you any ideas for future videos, I am currently trying to build something this is probably fairly simple, but awkward for me because my front-end experience is weak. I want to make a basic RAG system with clean chat interface that is a front end for ollama. I would prefer Svelte by could switch to another framework. As a first step, I just want to store every request/response exchange (user request, assistant response) into ChromaDB. I plan to ingest documents into the DB, but the first goal is just to do something like automatically pruning the conversation history to just the top N most semantically relevant exchanges. The simple use case here is that I want to be able to carry on one long conversation over various topics. When I change the topic back to something discussed before it should be able to automatically bring the prior conversations into the context.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      This sounds like a really cool project! How far have you gotten so far? I plan to do several videos on increasingly complex RAG techniques, which will include conversation history and embedding / retrieval. In the mean time, you might consider a low-code UI tool like Streamlit llm-examples.streamlit.app/

  • @originialSAVAGEmind
    @originialSAVAGEmind 3 หลายเดือนก่อน +1

    @decoder I followed your tutorial exactly. I am on Windows which I know is new however when I try to create the new model from the model file I get "Error: no FROM line for the model was specified" Any thoughts on how to fix this?? I edited the modelfile in notepad incase this is the issue.

  • @robertdolovcak9860
    @robertdolovcak9860 4 หลายเดือนก่อน

    Thank you. I enjoyed your tutorial. One question, is there a way to see Ollama's speed of inference (tokens/sec)?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thanks for watching. Yes you can use the `--verbose` flag in the terminal to see inference speed. eg `ollama run --verbose phi`

  • @marinetradeapp
    @marinetradeapp 29 วันที่ผ่านมา +1

    Great Video-Arrr-How can we pull data into an agent from a webhook, have the agent do a simple task, and then send the result back out via a webhook? This would make a great video.

  • @Ucodia
    @Ucodia 4 หลายเดือนก่อน

    Great video thank you! I used it to customize dolphin-mixtral to specialize it for my coding needs and combined it with Ollama WebUI which I highly recommend. What I am still wondering is how can I augment the existing dataset with my own code dataset, I could not figure this out so far.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +2

      Thanks for sharing! In a future video I intend to talk about fine tuning, which sounds relevant to what you’re looking for

  • @dusk2dawn2
    @dusk2dawn2 2 หลายเดือนก่อน +1

    Nice! Is it possible to use these huge models from an external harddisk?

    • @decoder-sh
      @decoder-sh  2 หลายเดือนก่อน +1

      It is, but you’ll pay the price every time they’re loaded into memory.

  • @AntoninKral
    @AntoninKral 4 หลายเดือนก่อน +1

    I would recommend changing FROM to point to point to name, not hash (like FROM phi). It makes your life way easier when pulling new versions.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Hi there, could you tell me more about this? If "phi" points to the hash and not the name, then what name should be used? I would like to make my life easier 🙏

    • @AntoninKral
      @AntoninKral 4 หลายเดือนก่อน

      @@decoder-sh let's assume that you fetch "phi" model with hash hash1. You create your derived model using hash1. Later on, you fetch updated "phi" with hash2. Your derived model will still be using the old weights from hash1.
      Furthermore, if you use names in your model files, they will be portable. If you take a closer look to your modelfile -- it points to an actual file on disk. So if you send model file to someone else / upload it to the other computer, it will not work. While, if you use something like 'FROM phi:latest', ollama will happily fetch the underlying model for you.
      Same stuff as container images.

  • @mernik5599
    @mernik5599 2 หลายเดือนก่อน +1

    Is it possible to enable internet access to ollama models? After following your tutorials i was able to do ollama and web ui setup very easily! Just wondering if there are solutions already developed that allows function calling and internet access when interacting with models through web ui

    • @decoder-sh
      @decoder-sh  2 หลายเดือนก่อน

      This would be achieved through tools and function-calling! I plan to do a video on exactly this very soon, but in the mean time, here are some docs you could look at python.langchain.com/docs/modules/model_io/chat/function_calling/

  • @danielallison3540
    @danielallison3540 หลายเดือนก่อน +1

    How far can you go with the model file? If I wanted to take an existing model and make it an expert in some documents I have would piping those docs to the SYSTEM prompt on the model file be the way to go?

    • @decoder-sh
      @decoder-sh  หลายเดือนก่อน

      Depending on how large your model's context window is, and how many documents you have, that is one way to do it! If all of your documents can fit into the context window, then you don't need a whole RAG pipeline.

  • @gokudomatic
    @gokudomatic 4 หลายเดือนก่อน

    Very nice!
    But how to do that using docker instead of directly a local install of ollama?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      Assuming you already have the ollama docker image installed and running (hub.docker.com/r/ollama/ollama)...
      Then you can just attach to the container's shell with `docker exec -it container_name bash`.
      From here, use (and install if necessary) an editor like vim or nano to create and edit your custom ModelFile, then use ollama to create the model as usual.
      Ollama will move your modelfile into the attached volume so that it will be persisted between restarts 👍

  • @ArunJayapal
    @ArunJayapal 4 หลายเดือนก่อน

    Good work. 👍
    About the phi model? Can it run on a laptop inside a virtualbox vm? The host machine with 2cpu and 6gb ram?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thanks for watching! It will probably be a little slow if it only has access to cpu, but I think it should at least run. Try it and report back 🫡

    • @ArunJayapal
      @ArunJayapal 4 หลายเดือนก่อน

      @@decoder-sh it does run. But out of curiosity what configuration did you use for the video?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      @@ArunJayapal I'm running it on an M1 macbook pro, which has no issues with small models. I don't know what the largest model I can run is, but I know it's at least 34B

  • @PiotrMarkiewicz
    @PiotrMarkiewicz 4 หลายเดือนก่อน

    Is there any way to add information to model? Like training update?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      There is! I plan on doing several videos on different ways to add information to models - the two main ways to do this are with fine tuning, and retrieval augmented generation (RAG)

  • @johnefan
    @johnefan 4 หลายเดือนก่อน

    Great video, love the format. Is there a way to contact you?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      Hey thanks! I’m still setting up my domain and contact stuff (content is king), but for the time being you can send me a DM on Twitter if that works for you x.com/decoder_sh

    • @johnefan
      @johnefan 4 หลายเดือนก่อน

      @@decoder-sh Great, thanks. Started following you on Twitter, looks like your DMs are not open

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Hey I wanted to follow up and let you know I created a quick site and contact form! decoder.sh/ (https coming as soon as DNS propagates, sorry)

  • @MacProUser99876
    @MacProUser99876 4 หลายเดือนก่อน

    Can you please show multimodal models like LLAVA?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      I'd love to! What would you like to see about them?

  • @kamleshpaul414
    @kamleshpaul414 4 หลายเดือนก่อน

    can we use ollam to pull from huggingface our own model?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +2

      Yes in fact one of my upcoming videos will walk through how to do that!

    • @kamleshpaul414
      @kamleshpaul414 4 หลายเดือนก่อน +1

      @@decoder-sh Thank you so much

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      This one's for you! th-cam.com/video/fnvZJU5Fj3Q/w-d-xo.html

  • @kachunchau4945
    @kachunchau4945 4 หลายเดือนก่อน +1

    Hi,your work will be helpful for my experiment. A classification task with the model in ollama. But I found two different API when I wrote requests. One is /api/generate, and another one is /api/chat. Could you tell me the difference? and how to set uo the "role" in moldefile? thanks in advance

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      Hi, that's a great question! The difference is subtle; both the generate and chat endpoints are telling the LLM to predict the next series of tokens under the hood.
      The generate endpoint accepts one prompt and gives one response, so any context needs to provided within that prompt. The chat endpoint accepts a series of messages as well as a prompt - but what's really happening is ollama concatenates these messages into one big string and then passes that whole chat history string as context to the model. So to summarize, the chat endpoint does exactly the same thing as the generate endpoint, it just does the work of passing a message history as context into your prompt for you.
      For your last question, ollama only recognizes three "roles" for messages: system, user, and assistant. System comes from your modelfile system prompt. User is anything you type. Assistant is anything your model responds with.
      Do you think it's worth me doing a video to expand on this?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      Here are the relevant code snippets btw - check them out if you read Go, or have your LLM give you a tldr :)
      Concatenate chat messages into a single prompt:
      github.com/ollama/ollama/blob/a643823f86ebe1d2af39d85581670737508efb48/server/images.go#L147
      In the chat endpoint handler, pass the aforementioned prompt to the llm predict method:
      github.com/ollama/ollama/blob/a643823f86ebe1d2af39d85581670737508efb48/server/routes.go#L1122

    • @kachunchau4945
      @kachunchau4945 4 หลายเดือนก่อน

      @@decoder-sh Thank you very much for your detailed answer, when I was reading the development documentation for Chatgpt, it has a similar role setup, which helped me to understand the same in Ollama very well, but the way similar to /api/generate in Chatgpt is already LEGACY. For the difference between the two different APIs, I've watched a lot of videos online and they all lack answers and examples for this.
      1. For /api/generate, my understanding is that it's like a single request, but I'm curious how to make the response controllable, for example for a certain number of labels ( classification questions). Is it set through the Template of the modelfile? How would that be written.
      2. For /api/chat, but according to your explanation, do messages need to append previous questions and answers before this prompt? If so, should I set up a loop to keep appending questions and answers from the previous messages?
      3. Since I'm not a TH-camr, I don't have the intuition to judge whether it's worth making another video or not. But as far as I can see, no one on YT has explained in depth how templates are written in the modelfile, just SYSTEM section, and not explaining its impact or effect. And of course there's the difference between the two APIs I talked about earlier and how the chat API is used. I think it will be helpful for developers who want to build servers in the cloud!

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      @@kachunchau4945 Yes you're correct, you would use the system prompt to instruct the model how to respond to you. I recommend also giving it an example exchange so it understand the format. I wrote a system prompt for a simple classification task which you can adapt to your use case. I quickly tested this and it works even with small models.
      """
      You are a professional classifier, your job is to be given names and classify them as one of the following categories: Male, Female, Unknown. If you are unsure, respond with "Unknown". Respond only with the classification and nothing else.
      Here is an example exchange:
      user: Mark
      assistant: Male
      user: Jessica
      assistant: Female
      user: Xorbi
      assistant: Unknown
      """
      The above is your system prompt, and your user prompt would be the thing you want to classify.

    • @kachunchau4945
      @kachunchau4945 4 หลายเดือนก่อน

      @@decoder-shthank you so much, that is very helpful for me. I will try it later. But addition to SYSTEM, do I need to write a template ?

  • @android69_
    @android69_ 2 หลายเดือนก่อน +1

    how do you load your own model, not from the website?

    • @decoder-sh
      @decoder-sh  2 หลายเดือนก่อน

      I've got the answer right here :) th-cam.com/video/fnvZJU5Fj3Q/w-d-xo.html

  • @harshith24
    @harshith24 2 หลายเดือนก่อน

    if I run the command ollama run phi , will phi model get installed in my c drive ???

    • @decoder-sh
      @decoder-sh  2 หลายเดือนก่อน

      It will! Ollama pulls a hash of the latest version of the model. If you don't have that model downloaded, or if you have an older version downloaded, ollama will download the latest model and save it to your disk.

  • @Chrosam
    @Chrosam 4 หลายเดือนก่อน

    If you ask it a follow-up question it already forgot what you're talking about.
    How do we keep a context ?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Thanks for watching!
      It could be a number of things:
      - small models sometimes lose track of what they’re talking about, big models usually do better
      - some models are optimized for chatting, others are not
      - you may have history disabled in ollama (though I don’t think that’s the default). From the ollama cli, type “/set history”

  • @harishraju4321
    @harishraju4321 2 หลายเดือนก่อน

    is this considered as 'fine-tuning' an LLM?

    • @decoder-sh
      @decoder-sh  2 หลายเดือนก่อน

      Definitely not! This is basically just using a system prompt to steer the behavior of the model. Fine tuning involves retraining part of the model on new data - I intend to do a video about that soon though :)

  • @lsmpascal
    @lsmpascal 3 หลายเดือนก่อน

    Can I suggest a video which I think will be usefull for a lot of people : how to optimise a server to run a model using ollama.
    I’m currently trying to do so. The goal is to have a Mistral running on a vultr < 300€/month.
    But I’m failing. Ollama is here, Mistral too, but The perf are terrible.
    I guess I’m not the only guy searching for this kind of thing.

    • @decoder-sh
      @decoder-sh  3 หลายเดือนก่อน

      Ollama is not designed to handle multiple users (I'm guessing that's your use case for a $450/mo server?), for that I would look into something like vLLM, LMDeploy, or HF's text-generation-inference. With that said, I plan to do a video on cloud deploys to support multiple concurrent requests in the future!

    • @lsmpascal
      @lsmpascal 3 หลายเดือนก่อน

      I'm looking forward watching this one, because i'm currently totally lost.
      Ah, a last thing, I love the way your videos are made. Clean but not too present style and interesting content. Keep it this way!
      Thank you very much. @@decoder-sh

  • @optalgin2371
    @optalgin2371 2 หลายเดือนก่อน

    What's the difference between copying a model and creating from a model?

    • @decoder-sh
      @decoder-sh  2 หลายเดือนก่อน

      Interesting question... It seems that in both cases (`ollama cp baseModel modelCopy` and `ollama create myModel -f modelfile` where modelfile uses "FROM baseModel:latest"), a new manifest file is created, but no new model blobs are created. This means that both actions are storage-efficient. You can verify this yourself by using `du` to print the directory size of `~/.ollama/models` before and after each of those actions.

  • @deepjyotibaishya7576
    @deepjyotibaishya7576 24 วันที่ผ่านมา

    How to train with own dataset

  • @nicolawirz7938
    @nicolawirz7938 19 วันที่ผ่านมา

    why does your terminal like this on Mac?

  • @daveys
    @daveys 4 หลายเดือนก่อน

    Phi is too halucinatory for my liking, but unfortunately mixtral is too large and intense for my crappy old laptop. One thing for certain, LLM’s are a power hungry beast!

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      That’s fair, I’ve found starling-lm to be a strong light model, and some flavor of mistral (eg dolphin-mistral) for 7B

    • @daveys
      @daveys 4 หลายเดือนก่อน

      @@decoder-sh - Mixtral ground my old laptop (4th gen 4 core i5 with an onboard graphics and 8GB RAM) to a halt…still ran but one word every 1-2mins wasn’t a great user experience. Phi was quicker, but like talking to a maths professor on acid.

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      @@daveys I mean honestly, that sounds like a fun way to spend a Sunday afternoon. Yeah I wouldn't expect mixtral to do well on consumer hardware, especially integrated graphics. I'd experiment with a 7b model first and see if it behaves more like a literature professor on mushrooms, then maybe try a 34b model if you still get reasonable wpm.

    • @daveys
      @daveys 4 หลายเดือนก่อน

      @@decoder-sh - enjoyable if you were the professor but not waiting for the LLM to answer a question!! I knew local AI would be bad on that machine, to be honest I was surprised it ran at all, but I’ll stick to ChatGPT at the moment and wait until I upgrade my laptop before I start messing with any more LLM stuff.

  • @lucasbarroso2776
    @lucasbarroso2776 27 วันที่ผ่านมา

    I would love to see a video on model files. Specifically how to train a model to do a specialized task, I am trying to use Llama 2 to consolidate facts in articles.
    "Do these facts mean the same thing?
    Fact 1: "Starbucks's dtock went down by 13%
    Fact 2: Starbucks has a new bobba tea flavour"
    Response: {isSame:false}

  • @federicoloffredo1656
    @federicoloffredo1656 4 หลายเดือนก่อน

    Hi, what about windows users?

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Unfortunately windows is not supported natively, but you can still install ollama on Linux (in windows) via WSL. Probably suboptimal though

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน

      Looks like it’s coming soon! x.com/alexreibman/status/1757333894804975847?s=46

  • @marsrocket
    @marsrocket 3 หลายเดือนก่อน

    Excellent video, although I think you could raise the skill lower level you’re targeting. Nobody who is going to install and use Ollama on their own doesn’t know what > means.

    • @decoder-sh
      @decoder-sh  3 หลายเดือนก่อน

      I’m getting that impression, too! I’m going to try to make future videos a bit faster and more focused on doing the thing than explaining the language. Will probably continue explaining tools and logic.

  • @VertegrezNox
    @VertegrezNox 4 หลายเดือนก่อน

    Nothing about this involved customization. Clickbait channel

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      Full fine tuning video coming in a couple weeks, this is a video for beginners 🫡

  • @matbeedotcom
    @matbeedotcom 4 หลายเดือนก่อน

    You edited the system prompt....

    • @decoder-sh
      @decoder-sh  4 หลายเดือนก่อน +1

      Yes and fine tuning is coming too! Thanks for watching