thanks for sharing this! I think "private AI" is the future and this project definitely makes it easier for people to run their own local models. cloning now 😁
when installed and run got this error " File pydantic/main.py:341 in pydantic.main.BaseModel.__init__ ValidationError: 1 validation error for LLMChain llm none is not an allowed value (type=type_error.none.not_allowed)"" Any idea how to fix it?
Thank you, I'm learning so much from you. I had two questions on scalability. 1) If you had a simultaneous queries on the api, how does localgpt handle it? Will it queue the requests or run the in parallel, albeit slower? 2) I noticed that the searches are sometimes taking upwards of 30 seconds on a V100 GPU using a 7B Llama 2 model model. Are there any ways to optimize or accelerate the inference/retrieval speeds? Thanks!!
Thanks :) To answer your questions. 1) Right now, it will queue it but that can be improved. 2) There are a few improvements that can be made to improve the speed. One possibility is to utilize different embedding model and experiment with different LLMs.
Thanks for sharing, but there is a problem with this model. I’m not sure if it’s a bug or normal logic. If I try to ask the same question, its answering time will increase exponentially. Is this caused by reading in historical communication data every time?
mmm... it looks like its old way, but adding the web API, great! Anyway, I found that any sentence transformer model can become the instruction embedding model for the project. But... the hkhulp/instructor-xl is still remain the most accurate instruction embedding model.
@@engineerprompt getting an error on the last line of code before running the code for localgpt, all the dependencies and env are good, cant seem to figure out where the bug is. also, wanted to touch base on the consultancy thing we discussed. finally got an update on it.
can you please tell what is the RAM, CPU and hard disk requirements to run localGPT? because im getting answers after 40mins for basic questions. Im having 12gb RAM as well. even i tried with GPUs of google colab, but still the answers are very late like after 40 mins
it remains to be seen if H2O is as offline and private as they first suggest. also, i do not want to run Java on my machine. H2O is very much a corporate controlled model. we will see if its offline function is anything but bait as time goes on.
@Prompt Engineering i have issue while running local_api file .. automatically db vanishes and also even though source documents present it says no documents. please help guys..
I can't complete the requirements.txt because chroma-hnswlib required MSVC++ 14.0 or above to build the wheels. I installed visual builder tools and everything but still nothing. maybe it is the python version compatibility?
Thanks for putting together a step-by-step tutorial. Very helpful! All your videos are amazing. I was looking for an exact solution to query local confidential documents. Two quick questions, how do I switch to 13b model? How do I train the model on custom database schema and SQL queries? I tried it with a schema document but sql queries it returned were not at all useful. A similar scenario with ChatGPT API returned good results.
This is such a huge thing and you're not getting enough attention for it! I'm getting the UI to run on the 5111 port, but I am running across the issue of the initial python run_localGPT_API.py showing 'FileNotFoundError: No files were found inside SOURCE_DOCUMENTS, please put a starter file inside before starting the API!' but the constitution pdf is already there. Please advise!
Hi! , I have seen at 4:51 minutes of your video you have the following list of processor architeture features: AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | Those are same as mine Orange pi 5 ARM RK3588S 8-core 64-bit processor . NEON: NEON is an SIMD architecture extension for ARM processors, ARM_FMA: Similar to the x86 FMA, ARM_FMA represents Fused Multiply-Add support for ARM processors. Are you using Orange pi 5 too? I'm trying to use the NPU of the RK3588S for the localGPT
Thanks for this amazing video!!! can you suggest a high performance AWS EC2 instance where we can host this app? any suggestions to run this app in parallel...
Thanks for the video and useful information. The LocalGPT project uses some models described in constants.py file as a MODEL_ID and MODEL_BASE. Where this model is stored ? Also question about eg Fine tune with autotrain. Can you please tell me where are stored data when I use in command: "autotrain ... --data_path 'timdettmers/openassistant-guanaco' ..." ? I've triggered this command from my users home folder but don't see any files downloaded.
When you run that, it will create a new folder called "models" and the models will be downloaded to that folder. For autotrain, it should also download the model to that folder.
Yes, that's possible. You will need to provide the folder name as command line argument. Look at he constants.py on understanding how it set in the code.
Hello there, I try to launch the application but I have a problem 😢. When I launch the command line "pip install -r requirements.txt" in the anaconda prompt, I have the error "ERROR: Could not find a version that satisfies the requirement autoawq (from versions: none)". So after many attempts I tried to install from the source autoAWQ (clone the repository git clone) and tried to launch it. Then I have a new error "ERROR: Could not find a version that satisfies the requirement torch (from versions: none)". Has anyone encountered this error?
Thank you for the video. really appreciate your effort in putting together the UI layer. I've a question, run_localGPT_API.py execution not starting the API console. Following is the status on my VS Code terminal for about an hour. Mac-Studio lgpt1 % python run_localGPT_API.py --device_type mps load INSTRUCTOR_Transformer max_seq_length 512 Am I doing anything wrong? Appreciate your response.
@@arturorendon145 I think chunking and embedding must get better, also saving more metadata like Page numbers would be nice. I have not looked at the implementations of langchain (which are used). Just thinking of somehing like using different sizes of chunks in sequence. The embeddings remind me a lot of locality sensetive hash algorithms. So maybe copy some tricks there.
I used T4 Cuda 16GB GPU. It will also take 3-4minutes to answer my question. But the answer to my file content is very precise with 4 5 page content. Taking 2-4 minutes to get answer is normal in this condition?
@@engineerprompt thank you so much for your great job and share with us. I am glad that the answer to my files are great although it takes a little longer to answer it. I will test more files and try different models. I also need to modify the prompt to make the answer more concise. I will check runpod you mentioned. Thank you.
Hello again thanks for your video. I followed the instruction and the ingest.py script works fine. But when I try running the run_localgptapi or the run_localgpt I get the following error: pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 1 validation error for LLMChain llm none is not an allowed value (type=type_error.none.not_allowed). I have text-generation-web-ui working with the TheBloke_WizardCoder-15B-1.0-GPTQ model working. I think it works because in Pinokio it is probably a docker container.
@@engineerprompt good morning. thanks for your prompt reply. I did a git pull and some files were updated. Still getting error. 2023-09-17 08:42:34,983 - INFO - load_models.py:38 - Using Llamacpp for GGUF/GGML quantized models Traceback (most recent call last): File "pydantic\main.py", line 341, in pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 1 validation error for LLMChain llm none is not an allowed value (type=type_error.none.not_allowed)
@@engineerprompt Sorry should have been clearer, got distracted. I meant using GPU and not CPU. I'll check repo for those instructs, but don't remember seeing them.
you are never telling people about what do you mean powerful machine, what is the minimum requirements for a system to run these models, because in our laptops anyway these are not running. also a educative video about how to setup this on cloud server will also make your tutorial a complete tutorial. Orelse all these videos are just for showing your knowledge, where common man like us cant implement it. hope you understand this feedback
I found the following article to share, "OpenAI, a non-profit artificial intelligence (AI) research company, has announced its closure due to lack of funding from wealthy patrons. To continue its work and attract capital, OpenAI plans to create a new for-profit-related company. This decision is due to the need to invest millions of dollars in cloud computing and attract AI experts to remain competitive. High salaries for AI experts have also contributed to OpenAI's unviability. Although they will continue to be available, OpenAI tools may be affected by this change. Alternatives such as Scikit-learn, Pandas, Azure ML, and OpenCV are presented for Machine Learning projects."
Thank you and everyone contributing to this!
thanks for sharing this! I think "private AI" is the future and this project definitely makes it easier for people to run their own local models. cloning now 😁
when installed and run got this error " File pydantic/main.py:341 in pydantic.main.BaseModel.__init__
ValidationError: 1 validation error for LLMChain
llm
none is not an allowed value (type=type_error.none.not_allowed)"" Any idea how to fix it?
Thank you for sharing your knowledge. It is greatly appreciated
Just in time! Great for Sunday, I’ll blame you if wife yells at me for not watching tv with her😂😂😂
🤣🤣
Your wife should never yell at you. She should respect you for your curious mind and vice versa.
@@zorayanuthar9289 I wish my wife is the reasonable one as you described 🥹🥹🥹
Very usefull :)
Much appreciatre your hard work on the project and the videos
Thank you, I'm learning so much from you. I had two questions on scalability. 1) If you had a simultaneous queries on the api, how does localgpt handle it? Will it queue the requests or run the in parallel, albeit slower? 2) I noticed that the searches are sometimes taking upwards of 30 seconds on a V100 GPU using a 7B Llama 2 model model. Are there any ways to optimize or accelerate the inference/retrieval speeds? Thanks!!
Thanks :) To answer your questions. 1) Right now, it will queue it but that can be improved. 2) There are a few improvements that can be made to improve the speed. One possibility is to utilize different embedding model and experiment with different LLMs.
What is the best way to generate the Rest API for other applications to call it?
Thanks for sharing, but there is a problem with this model. I’m not sure if it’s a bug or normal logic. If I try to ask the same question, its answering time will increase exponentially. Is this caused by reading in historical communication data every time?
mmm... it looks like its old way, but adding the web API, great! Anyway, I found that any sentence transformer model can become the instruction embedding model for the project. But... the hkhulp/instructor-xl is still remain the most accurate instruction embedding model.
Great as always. Need a UI for associates so they just run an inquiry and not able to reset or add to the local knowledge base
Thanks :) Its already there, will be covering it in future video!
@@engineerprompt getting an error on the last line of code before running the code for localgpt, all the dependencies and env are good, cant seem to figure out where the bug is. also, wanted to touch base on the consultancy thing we discussed. finally got an update on it.
@@ckgonzales16 what is the error? would love to connect again, let's schedule some time.
Is there a more generalized api wrapper? Is this specifically for documents?
At the moment this is specific to the documents you ingest but I will add a generalized api that you can use to talk to the model itself.
can you please tell what is the RAM, CPU and hard disk requirements to run localGPT? because im getting answers after 40mins for basic questions. Im having 12gb RAM as well. even i tried with GPUs of google colab, but still the answers are very late like after 40 mins
@engineerprompt
@engineerprompt
Why not H2OGPT? More capabilities, but has GPU usage option. If you have customer grade GPU(RTX3060) then it is at least 10 times faster.
it remains to be seen if H2O is as offline and private as they first suggest. also, i do not want to run Java on my machine. H2O is very much a corporate controlled model. we will see if its offline function is anything but bait as time goes on.
how to deploy on gcp, aws and get website url instead of that localhost
Were you able to get the solution for this?
@Prompt Engineering i have issue while running local_api file .. automatically db vanishes and also even though source documents present it says no documents. please help guys..
I have the same problem. FileNotFoundError: No files were found inside SOURCE_DOCUMENTS, please put a starter file inside before starting the API!
I can't complete the requirements.txt because chroma-hnswlib required MSVC++ 14.0 or above to build the wheels. I installed visual builder tools and everything but still nothing. maybe it is the python version compatibility?
Thanks for putting together a step-by-step tutorial. Very helpful! All your videos are amazing. I was looking for an exact solution to query local confidential documents.
Two quick questions, how do I switch to 13b model? How do I train the model on custom database schema and SQL queries? I tried it with a schema document but sql queries it returned were not at all useful. A similar scenario with ChatGPT API returned good results.
Could you show us how to use it in google colab?
Yeah, I will make a video on it.
@@engineerprompt thanks, it would be awesome. I look forward to it 🤩
Sir create a google Collab script to run this for low end pc user....You are my dream teacher....
This is such a huge thing and you're not getting enough attention for it! I'm getting the UI to run on the 5111 port, but I am running across the issue of the initial python run_localGPT_API.py showing 'FileNotFoundError: No files were found inside SOURCE_DOCUMENTS, please put a starter file inside before starting the API!' but the constitution pdf is already there. Please advise!
Hi! , I have seen at 4:51 minutes of your video you have the following list of processor architeture features: AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | Those are same as mine Orange pi 5 ARM RK3588S 8-core 64-bit processor . NEON: NEON is an SIMD architecture extension for ARM processors, ARM_FMA: Similar to the x86 FMA, ARM_FMA represents Fused Multiply-Add support for ARM processors. Are you using Orange pi 5 too? I'm trying to use the NPU of the RK3588S for the localGPT
Great Stuff! Thanks
Thanks for this amazing video!!! can you suggest a high performance AWS EC2 instance where we can host this app? any suggestions to run this app in parallel...
Thanks for the video and useful information.
The LocalGPT project uses some models described in constants.py file as a MODEL_ID and MODEL_BASE. Where this model is stored ?
Also question about eg Fine tune with autotrain. Can you please tell me where are stored data when I use in command: "autotrain ... --data_path 'timdettmers/openassistant-guanaco' ..." ? I've triggered this command from my users home folder but don't see any files downloaded.
When you run that, it will create a new folder called "models" and the models will be downloaded to that folder.
For autotrain, it should also download the model to that folder.
@@engineerprompt many thanks for the details 🙂
I just need to use LocalGPT on CLI putting some shortcuts to real doc folders and digest all them , is it possible?
Yes, that's possible. You will need to provide the folder name as command line argument. Look at he constants.py on understanding how it set in the code.
Hello there, I try to launch the application but I have a problem 😢. When I launch the command line "pip install -r requirements.txt" in the anaconda prompt, I have the error "ERROR: Could not find a version that satisfies the requirement autoawq (from versions: none)". So after many attempts I tried to install from the source autoAWQ (clone the repository git clone) and tried to launch it. Then I have a new error "ERROR: Could not find a version that satisfies the requirement torch (from versions: none)". Has anyone encountered this error?
You are working great ❤❤
can you complete it by making it as an app e.g render
I'm running this on a 1070 and it takes about 5min to answer a question. How much power to get like a 30sec-1min answer? Is this possible?
Why u not building docker file or dockers
Thank you
Thank you for the video. really appreciate your effort in putting together the UI layer. I've a question, run_localGPT_API.py execution not starting the API console. Following is the status on my VS Code terminal for about an hour.
Mac-Studio lgpt1 % python run_localGPT_API.py --device_type mps
load INSTRUCTOR_Transformer
max_seq_length 512
Am I doing anything wrong? Appreciate your response.
Hi always using only that constitution document is misleading the output quality. Why don't you use some math or law document to test the output.
Tried it with iso Standards - Output is bad
@@JustVincentD how can we improve it?
@@arturorendon145 I think chunking and embedding must get better, also saving more metadata like Page numbers would be nice.
I have not looked at the implementations of langchain (which are used). Just thinking of somehing like using different sizes of chunks in sequence. The embeddings remind me a lot of locality sensetive hash algorithms. So maybe copy some tricks there.
Thanks for the video, can you please make a video on finetuning llama-2 model on pdf documents.
Maybe this will help - th-cam.com/video/lbFmceo4D5E/w-d-xo.html It's not for fine tuning, but will give you a start on doing Q&A with your docs.
@@paulhanson6387 thank lot paul
I used T4 Cuda 16GB GPU. It will also take 3-4minutes to answer my question. But the answer to my file content is very precise with 4 5 page content. Taking 2-4 minutes to get answer is normal in this condition?
That's on the longer side but you probably need access to a better GPU. Checkout runpod
@@engineerprompt thank you so much for your great job and share with us. I am glad that the answer to my files are great although it takes a little longer to answer it. I will test more files and try different models. I also need to modify the prompt to make the answer more concise. I will check runpod you mentioned. Thank you.
Hi i would like to help to make the project better, how can I help. Just find some bugs and some codes that could be nicer
How would this work with OneDrive?
You will have to give it access to read files from your drive. Other than that, it probably will work without much changes.
Hello again thanks for your video. I followed the instruction and the ingest.py script works fine. But when I try running the run_localgptapi or the run_localgpt I get the following error: pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for LLMChain
llm
none is not an allowed value (type=type_error.none.not_allowed).
I have text-generation-web-ui working with the TheBloke_WizardCoder-15B-1.0-GPTQ model working. I think it works because in Pinokio it is probably a docker container.
Did you pull the latest changes to the repo?
@@engineerprompt good morning. thanks for your prompt reply. I did a git pull and some files were updated. Still getting error.
2023-09-17 08:42:34,983 - INFO - load_models.py:38 - Using Llamacpp for GGUF/GGML quantized models
Traceback (most recent call last):
File "pydantic\main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for LLMChain
llm
none is not an allowed value (type=type_error.none.not_allowed)
And if anyone knows how to get this running on Mac Silicon, like an M1, please post any advice?
I run it on M2, follow the steps listed on the repo
@@engineerprompt Sorry should have been clearer, got distracted. I meant using GPU and not CPU. I'll check repo for those instructs, but don't remember seeing them.
It looks like privateGPT is looking for torch.cuda.is_available() .. and I'm using Silicon MPS. In my case torch.backends.mps.is_available(): is True
great progress. perhaps making a docker image would be the next step to simply the devops of this setup.
what if i have thousands of pdf that i want to ask questions from
It will still work, the response time might be a bit slower but this will work
Is it fast?
you are never telling people about what do you mean powerful machine, what is the minimum requirements for a system to run these models, because in our laptops anyway these are not running. also a educative video about how to setup this on cloud server will also make your tutorial a complete tutorial. Orelse all these videos are just for showing your knowledge, where common man like us cant implement it. hope you understand this feedback
Wothout internet when running python run_.... this throw error
This cool
Are you making mistakes in the previews on purpose to get more comments?
are you making pointless comments to get more comments? if you found an error, share a solution. otherwise, you are just whining.
Recplace
Lol, released 9 sec ago
Is it free?
Yes
This is begging to be containerized.
sure, lets basically black box an open thing because you do not want to use conda
Completely showing errrors , unable ask a single question :(
openai is not open and localgpt is not local, thanks for nothing
what? It is local though...
Its not openai
@@photon2724 local means cut of the internet and have it run normally
@@mokiloke Look again. I don't see where he says anything about OpenAI .. where do you see that?
I found the following article to share, "OpenAI, a non-profit artificial intelligence (AI) research company, has announced its closure due to lack of funding from wealthy patrons. To continue its work and attract capital, OpenAI plans to create a new for-profit-related company. This decision is due to the need to invest millions of dollars in cloud computing and attract AI experts to remain competitive. High salaries for AI experts have also contributed to OpenAI's unviability. Although they will continue to be available, OpenAI tools may be affected by this change. Alternatives such as Scikit-learn, Pandas, Azure ML, and OpenCV are presented for Machine Learning projects."
Does it support “h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3” - Model?