How to Run Any LLM using Cloud GPUs and Ollama with Runpod.io
ฝัง
- เผยแพร่เมื่อ 14 ก.ค. 2024
- Hello, and welcome to my video on how to run a server on runpod.io. The reason for this is because not everybody has a good enough computer with the hardware needed to run a local LLM. This is a cheap way to use a server that is much more powerful for your needs.
Don't forget to sign up for the newsletter below to give updates in AI, what I'm working on and struggles I've dealt with (which you may have too!):
=========================================================
📰 Newsletter Sign-up: bit.ly/tylerreed
=========================================================
🙋♂️ My GitHub: github.com/tylerprogramming/ai
🙋♂️ 31 Day Challenge: github.com/tylerprogramming/3...
🥧 PyCharm Download: www.jetbrains.com/pycharm/dow...
🐍 Anaconda Download: www.anaconda.com/download
🦙 Ollama Download: ollama.com/
🤖 LM Studio Download: lmstudio.ai/
📖 Chapters:
00:00 Intro
00:19 What is runpod.io?
01:11 How to setup
03:26 Install Ollama
05:40 Run Example
06:26 Outro
💬 If you have any issues, let me know in the comments and I will help you out!
This is the sauce! Thanks you! 🙏🏾
Thank you 🙌
Thanks!
You are welcome!
Since you can run a python file there in runpod, I’m assuming you can also serve a gradio ui from there? Kinda like in your TH-cam service video. I really appreciate all of your hard work on your channel. One of my favorite ag centric channels.
Yes you should be able to do that for sure
Yes you absolutely should be able to do this! Thank you I appreciate it 👍
How using Runpod serverless and pods differ in this use case, considering eg. costs? How can we minimize our costs eg. with stopping running the pod after usage?
Well, the idea is if you don't have a local machine that can run models well (if at all), then depending on the model you need, you can 'rent' a cheap server on this platform. The one I have in the example if it's up and running, was .79 per hour. If I stop it, then it says it costs .0006 per hour. So the cost of holding it until you want to run it again without actually TERMINATING it, is minimal.
I will look into the scheduling (if its possible) of the servers so like in AWS you can have it run for a certain amount of time per day
Hi what is the difference between this method and using vllm I saw in runpod data centric video which way is better
Is it possible to host the server here, or is the run pod just used for fine tuning and training models
you can absolutely host a server here!
Just found out how and got it, apparently you need to host it on port 80 but apparently I didn’t select that option when I made the GPU.
Ah gotcha I’m glad you got it figured out 👍
Is it possible to use a model on the server and parse it to the local Ollama to use it in any software locally?
yeah so I think like, if you had an api to retrieve something from the runpod.io llm, and then bring it locally for anything, then absolutely. You would just need the url for the runpod to grab the request. Hope that made sense. I do plan on having a video where we have something more 'production' ready