Thank you very useful, if you get an error when you put in the Docker text (when using the windows version), make sure you run through all the docker install stepes and restart if needed (you will need to register).
I need to make something like this but: answer general questions, like the things you presented + answer specific questions that are supposed to be collected from my own data base (totally offline). can you guide me how can i make it?
Yes, Ollama supports it. Refer to ollama.com/library/solar:10.7b. You need to pull the model first using command "ollama pull solar:10.7b" before you can select it inside openwebui. Hope this helps
having spent a lot of time building a speech to text and text to action system on the Rpi5 to work in conjunction with a toy robot car, working programmatically, I am not sure if what you are suggesting will fit my objectives but I would value your advice in this respect. I have also looked at using an API with the various GTPs but they are not free. I am a tight-wad, so even if the cost might be small, I still can't bring myself to risk a large bill. So, can the system you suggest be used programmatically? The code enters the request and acts on the reply without human intervention. Is the process in the video also free? I look forward to your reply.
Yes, this system work offline and does not cost a single-dollar (other than your internet cost, for downloading models locally, and electricity costs, for running the server locally) Hope this helpe
@@bonsaiilabs thx for ur timely reply. I may ask the question in a wrong way. I should ask : can we delete this "IIama3: latest" or just leave it a blank? Because I don't wont the user know what's behind it. Thanks again.
Most LLMs do not use the internet. Using some OSS tools, you can create RAG based system that can fetch webpages and give the text to Local LLMs. Hope this helps
Hello, this link might be useful for your answer stackoverflow.com/a/78390633 Excerpt is here A 70b model uses approximately 140gb of RAM (each parameter is a 2 byte floating point number). If you want to run with full precision, I think you can do it with llama.cpp and a Mac that has 192GB of unified memory, though the speed will not be that great (maybe a couple of tokens per second). If you run with 8 bit quantization, RAM requirements is dropped by half and speed is also improved. I hope this helps
@ricardoribeiro3281, the video is almost finished and will be out in next few days. Make sure you subscribe and click the bell icon so that you get the notification once it is available. Thanks
Thank you for your contribution. We just discovered privategpt, and will follow up with a video soon! Be sure to subscribe, if not already so that you get the notification when new video is live. Thank you again!
What if I use the model by pasting to the Terminal, but then I no longer want that model. What if I wanted to try a different model on llama, how do I uninstall the first model that I pasted to my Terminal and replace it with a new one?
Based on your OS type, you can remove Ollama by deleting its installation folder from there. macOS : Go to ~/Library/Application Support/Ollama/models Linux : Navigate to /usr/share/ollama/.ollama/models (note: you may need admin privileges to delete files in this directory) Windows : Delete the entire folder at C:\Users\\.ollama\models After removing the installation folder, check if Ollama is uninstalled by running the command ollama list in your terminal.
Thanks for the video very informative , I want to know to how to train model for own data ,like pdf ,word files etc to run on local machine. Thanks in Advance
If you want to extract information from your own data, you can do so by using Retrieval Augmented Generation (RAG) techniques with existing LLM models. Unless you need to train your model for a specific use case, I believe RAG would be useful. Do you mind sharing your use case?
You will need to fine-tune the model for your own use case. The base models can be finetuned, but you cannot train the base model as is. Hope that helps.
"Ollama is popular library for running LLMs on both CPUs and GPUs". I found this reference on skypilot.readthedocs.io/en/latest/gallery/frameworks/ollama.html. Hope that helps!
Thank you very useful, if you get an error when you put in the Docker text (when using the windows version), make sure you run through all the docker install stepes and restart if needed (you will need to register).
Thank you for adding more context for people who may hit any issues with docker
Finally i find easy tutorial thank you
You're welcome!
Great video. Very clear. Thank you!
Glad it was helpful!
Amazing and very clear step by step instructions! was able to replicate the work done on my computer
Thank you so much for this excellent tutorial!
You're welcome! Glad to know that it worked for you
Can you use a voice interface wth the offline models?
Would love to understand what's required.
Yes, please look at th-cam.com/video/RELQNYa4qNc/w-d-xo.html
I need to make something like this but: answer general questions, like the things you presented + answer specific questions that are supposed to be collected from my own data base (totally offline). can you guide me how can i make it?
Really wonderful, thank you!
I have a question: can I do custom training on some of my own documents over ll
ama ?
Do you mind sharing your use case? You may not need to train your model, rather use RAG technique over existing models. Thanks
good video.
Can we run solar 10.7B uncensored just inn the same way?
Yes, Ollama supports it. Refer to ollama.com/library/solar:10.7b. You need to pull the model first using command "ollama pull solar:10.7b" before you can select it inside openwebui. Hope this helps
this models can acess internet and scrap stuff , anyone knows one that can do it ? thanks
having spent a lot of time building a speech to text and text to action system on the Rpi5 to work in conjunction with a toy robot car, working programmatically, I am not sure if what you are suggesting will fit my objectives but I would value your advice in this respect. I have also looked at using an API with the various GTPs but they are not free. I am a tight-wad, so even if the cost might be small, I still can't bring myself to risk a large bill. So, can the system you suggest be used programmatically? The code enters the request and acts on the reply without human intervention. Is the process in the video also free? I look forward to your reply.
Yes, this system work offline and does not cost a single-dollar (other than your internet cost, for downloading models locally, and electricity costs, for running the server locally)
Hope this helpe
hello. Can we markup the model name " llama3: latest"? thx
Hello, as per ollama.com/library/llama3, you need to use "ollama pull llama3:latest". This should work
@@bonsaiilabs thx for ur timely reply. I may ask the question in a wrong way. I should ask : can we delete this "IIama3: latest" or just leave it a blank? Because I don't wont the user know what's behind it. Thanks again.
Even if I'm connected to the internet, will the model still not use the internet?
Most LLMs do not use the internet. Using some OSS tools, you can create RAG based system that can fetch webpages and give the text to Local LLMs. Hope this helps
What's a good machine that you can recommend where I want to load llama 3 70b?
Hello, this link might be useful for your answer
stackoverflow.com/a/78390633
Excerpt is here
A 70b model uses approximately 140gb of RAM (each parameter is a 2 byte floating point number). If you want to run with full precision, I think you can do it with llama.cpp and a Mac that has 192GB of unified memory, though the speed will not be that great (maybe a couple of tokens per second). If you run with 8 bit quantization, RAM requirements is dropped by half and speed is also improved.
I hope this helps
Super cool stuff
Thank you very much!
Is it possible to upload PDF files and ask for summarization?
Yes, it is definitely possible. Stay tuned and we will share a video about that soon
@ricardoribeiro3281, the video is almost finished and will be out in next few days. Make sure you subscribe and click the bell icon so that you get the notification once it is available. Thanks
Thank you for this. I would like to upload/ingest files to the privategpt is that possible?
Thank you for your contribution. We just discovered privategpt, and will follow up with a video soon! Be sure to subscribe, if not already so that you get the notification when new video is live. Thank you again!
What if I use the model by pasting to the Terminal, but then I no longer want that model. What if I wanted to try a different model on llama, how do I uninstall the first model that I pasted to my Terminal and replace it with a new one?
Based on your OS type, you can remove Ollama by deleting its installation folder from there.
macOS : Go to ~/Library/Application Support/Ollama/models
Linux : Navigate to /usr/share/ollama/.ollama/models (note: you may need admin privileges to delete files in this directory)
Windows : Delete the entire folder at C:\Users\\.ollama\models
After removing the installation folder, check if Ollama is uninstalled by running the command ollama list in your terminal.
can we train this with our own personal data , and if yes how ???
Any open models can be fine tuned. We will make videos in future to demonstrate this use case. Thanks for asking
Thanks for the video very informative , I want to know to how to train model for own data ,like pdf ,word files etc to run on local machine.
Thanks in Advance
If you want to extract information from your own data, you can do so by using Retrieval Augmented Generation (RAG) techniques with existing LLM models. Unless you need to train your model for a specific use case, I believe RAG would be useful.
Do you mind sharing your use case?
can be trained ? or will answer from his data only?
You will need to fine-tune the model for your own use case. The base models can be finetuned, but you cannot train the base model as is. Hope that helps.
In Short this is Self Hosting an AI? Thanks. in Advance
Yes
Can macbook air m1 handle this model ?
Honestly, I do not know as I do not know your machine configuration. Why not try out, you will know
can it run without gpu
I believe it can, but the inference might be slow. I would encourage to try out and let mw know you things do with you!
"Ollama is popular library for running LLMs on both CPUs and GPUs". I found this reference on skypilot.readthedocs.io/en/latest/gallery/frameworks/ollama.html.
Hope that helps!