It's one of the rare channels where I've been happy to see a new video notification for a long time. I'm waiting for the next time to come. I feel so lucky that you post videos very often and tell us the latest information perfectly. We don't want to lose you, keep it up bro
First second I hear Indian accent = Imma'bout to get pure useful information from this video, this is for sure. Thx for the video though 1 thing - when you talk about "previous video" at the beginning of the video, I'd consider to provide this "leading link" that shows in the corner because I'm not familiar and have to dig into your feed to find it. Not a great issue but still - just a tip to consider.
Is there a way to load multiple repos for the solution you showed. I have like 10 repos and i need my model to answer based on all the different code bases.
Hi Prompt Engineer. Amazing stuff as always. I just have a question about embedding in the Chroma DB in general. Where can i find the different Embedding models and how can i speed the embedding process up. Because for lets say 100 documents i think the app would crash. Doesnt seems scalable. Or am i getting something wrong here?
you can use something like this : embeddings = SentenceTransformerEmbeddings( model_name= "sentence-transformers/all-MiniLM-l6-v2", model_kwargs= {'device':'cuda'}, encode_kwargs= {'normalize_embeddings': False}) Chroma.from_documents( chunks, embeddings)
How do I ask questions that hit the same file, for example if I want to know how many different constructors a certain class has? I tried everthing and it always came back to me with an inferior number to the actual ones
How we can restrict the answers to the given context? No matter one what codebase I am creating embedding, the model LLM is given responses based on it pretrained data
Nice video! I had a question. What would be the best and fastest small model to use on an old cpu? Like 6th generation i5 for example. I heard of orcamini3b and falcon1b what do you think?
Hi,thanks for the tutorial but I am facing a RateLimitError while executing this particular line of code :"db = Chroma.from_documents(texts, OpenAIEmbeddings(disallowed_special=()))" for both Code-LLAMA and GPT-4 LLM .How can I solve this error?
use a different embeddings, like embeddings = SentenceTransformerEmbeddings( model_name= "sentence-transformers/all-MiniLM-l6-v2", model_kwargs= {'device':'cuda'}, encode_kwargs= {'normalize_embeddings': False})
Please do a video to setup GPU on a windows machine. I've been trying for days to use my GPU and can't get it to work. I've installed CMAKE, CUDA developer tools and etc. Nothing works. I have two GPUs on my laptop.
can we run llama2 model in windows machine with gpu with own documents. let me suggest which video will give information about this. mostly i am getting error while downloading data from meta llm using hugging face api.
When I'm trying with your template that you describe in the video. I'm getting strange answer when I print the output from: # Docs question = "{Current Question}" docs = retriever.get_relevant_documents(question) print(docs) [Document(page_content='ans = qa.run(\'Based on the stg_jaffle_shop.yml file generate sample csv data for each table with minimum 100 rows. The sample data should be with foreign key constraint.\')
# \'Write a test case for the database connection using unittest.\'
#\'Write a test case for the code in connection.py using unittest.\'
#\'Based on the .yml files generate sample csv data for jaffle_shop_customers table with 100 rows\' I see that they are in comments ... but the answer from model is the following..... "This request is not clear to me. Please provide more details so we can help you better. Do you want us to generate sample CSV files or do you want to create a test case using unittest? " it looks like the LLM model doesn't recognize the right question.
Yes, should have highlighted more, the openai api key is required for the embedding model, you can replace his by any other open source embedding model and this should work
Hi, thanks for your response, I have just started learning the long-chain and for the time being, I am not able to purchase chatPGPT for API key but I want to develop some apps using hugging face models to develop the apps, like 'Ask the doc' to read PDF and answer the questions. I will buy chatGPT if my app goes through the testing. Best Regards,
Yes but you will need to change the document loader part in the code. Will also need to make changes to the splitter part, the rest will work just fine.
@@engineerprompt Hm yeah I couldn't get RAG to work properly with my org's source code. Even tried to use it with LlamaCpp and the llm = pipeline() -> llm("my prompt") method
How is that interacting with the codebase? That's just generic how to init a react app that any gpt can do without knowing anything about the codebase.
Again very useful video, Thank you very much!! I'm new in this area and it's not clear .... I have a concern about embedding function/model - if you use OpenAIEmbeddings should I worry about some privacy concerns in cases involving sensitive data. I have already tried with open source model embedding like HuggingFaceInstructEmbeddings model_name = "hkunlp/instructor-large" but when I try to load it into Chroma gives me the following error message: chromadb.errors.InvalidDimensionException: Embedding dimension 384 does not match collection dimensionality 1536
Yes, you are sharing your data with openai. I regards to that error, simply delete your vector store and then rerun the embedding computation, this should work. Basically you have existing vector store where embeddings have different dimensions. You want to recompute everything from scratch
Nice vid, but you can't just use any generic embeddings model, you'd need something that works well for code and that's tough. OpenAI is probably one of the best still. Also anyone who uses GPT4 for code and tries 13B CodeLlama, or 34B even, is going to be sad. It's as bad as GPT3.5, mostly worse. Kind of useless for any higher level reasoning. Also if you're using it for Python why not use the Python-specific CodeLlama model that was optimized exactly for Python use?
Looks like if using M series, there is no cuda, so you can't really use gpu. I am not really sure what the author is using, but I don't think Mac can use GPU here.
This is very insightful tutorial I have applied this but I have doubt how you have created retriever object for Llama? retriever.get_relevant_documents(questions)
It's one of the rare channels where I've been happy to see a new video notification for a long time. I'm waiting for the next time to come. I feel so lucky that you post videos very often and tell us the latest information perfectly. We don't want to lose you, keep it up bro
谢谢!
Thank you 🙏
First second I hear Indian accent = Imma'bout to get pure useful information from this video, this is for sure.
Thx for the video though 1 thing - when you talk about "previous video" at the beginning of the video, I'd consider to provide this "leading link" that shows in the corner because I'm not familiar and have to dig into your feed to find it. Not a great issue but still - just a tip to consider.
Can you share the notebook link?
Is there a way to load multiple repos for the solution you showed. I have like 10 repos and i need my model to answer based on all the different code bases.
Hi Prompt Engineer. Amazing stuff as always. I just have a question about embedding in the Chroma DB in general. Where can i find the different Embedding models and how can i speed the embedding process up. Because for lets say 100 documents i think the app would crash. Doesnt seems scalable. Or am i getting something wrong here?
you can use something like this :
embeddings = SentenceTransformerEmbeddings( model_name= "sentence-transformers/all-MiniLM-l6-v2", model_kwargs= {'device':'cuda'}, encode_kwargs= {'normalize_embeddings': False})
Chroma.from_documents( chunks, embeddings)
How do I ask questions that hit the same file, for example if I want to know how many different constructors a certain class has? I tried everthing and it always came back to me with an inferior number to the actual ones
Thank you for your videos. I would appreciate if you publish the notebook
How we can restrict the answers to the given context? No matter one what codebase I am creating embedding, the model LLM is given responses based on it pretrained data
You will need to provide a system prompt. Check out my latest video on localgpt. There is an example prompt
Nice video!
I had a question. What would be the best and fastest small model to use on an old cpu? Like 6th generation i5 for example.
I heard of orcamini3b and falcon1b what do you think?
Hi,thanks for the tutorial but I am facing a RateLimitError while executing this particular line of code :"db = Chroma.from_documents(texts, OpenAIEmbeddings(disallowed_special=()))" for both Code-LLAMA and GPT-4 LLM .How can I solve this error?
use a different embeddings, like embeddings = SentenceTransformerEmbeddings( model_name= "sentence-transformers/all-MiniLM-l6-v2", model_kwargs= {'device':'cuda'}, encode_kwargs= {'normalize_embeddings': False})
Thank you for the video. I get 0 documents loaded when integrating the code to my Python script. does this work on Windows?
What's the difference between those two prompts you defined: sys prompt way and the simple one?
thank you for this video, could you please share the notebook or git link of this
How to generate code documentation from codellama? The input will be a class or method in java.
Thanks for the Amazing tutorial, Not able to get BLAS=1, created a ticket for the same. Looking forward to hearing from you
Please do a video to setup GPU on a windows machine. I've been trying for days to use my GPU and can't get it to work. I've installed CMAKE, CUDA developer tools and etc. Nothing works. I have two GPUs on my laptop.
can we run llama2 model in windows machine with gpu with own documents. let me suggest which video will give information about this. mostly i am getting error while downloading data from meta llm using hugging face api.
This is really great!
Thank you 🙏
all examples are on python code base, trying to make it work for Csharp code base but it doesnt work
thanks! if we do not want to use OpenAI embedding, what open source one you recommend for python code in this case? FAISS will be fine?
For embeddings, I like to use the instructors embeddings. FAISS will be fine as well.
Thanks! @@engineerprompt
Are the models from hugging face the same as those released by Meta?
It is for python code base. I tried for Java code base but it doesn't work. Can you suggest on this
When I'm trying with your template that you describe in the video. I'm getting strange answer
when I print the output from:
# Docs
question = "{Current Question}"
docs = retriever.get_relevant_documents(question)
print(docs)
[Document(page_content='ans = qa.run(\'Based on the stg_jaffle_shop.yml file generate sample csv data for each table with minimum 100 rows. The sample data should be with foreign key constraint.\')
# \'Write a test case for the database connection using unittest.\'
#\'Write a test case for the code in connection.py using unittest.\'
#\'Based on the .yml files generate sample csv data for jaffle_shop_customers table with 100 rows\'
I see that they are in comments ... but the answer from model is the following.....
"This request is not clear to me. Please provide more details so we can help you better. Do you want us to generate sample CSV files or do you want to create a test case using unittest? "
it looks like the LLM model doesn't recognize the right question.
Can I use codellama-7b-hf model with langchain
Amazing mate!
Thank you! Cheers!
Can you please give an example that loads 34b instead of 13b? For some reason, I can't get it to work with 34b,
I will have a look at it today, are you trying gguf format?
Can I use GGUF model my gpu doesnt have enough VRAM
Here also openAPI key is required. Is there anyway to access hugging face models without API key or with some free Key?
Yes, should have highlighted more, the openai api key is required for the embedding model, you can replace his by any other open source embedding model and this should work
Hi, thanks for your response, I have just started learning the long-chain and for the time being, I am not able to purchase chatPGPT for API key but I want to develop some apps using hugging face models to develop the apps, like 'Ask the doc' to read PDF and answer the questions. I will buy chatGPT if my app goes through the testing. Best Regards,
i did not see the notebook attachment. will you provide it? is this applicable to link to ui?
The link to documentation is in the description where the code is, should work with UI as well
Is it possible to use your localGPT project with Code-LLAMA (and ingest code to ChromaDB)?
Yes but you will need to change the document loader part in the code. Will also need to make changes to the splitter part, the rest will work just fine.
Do we have to use LlamaCpp for RAG?
Not really, it’s just for using the ggml/gguf models
@@engineerprompt Hm yeah I couldn't get RAG to work properly with my org's source code. Even tried to use it with LlamaCpp and the llm = pipeline() -> llm("my prompt") method
Is it possible to run this on Google colab?
Yes, you probably want to use the 7B model
How is that interacting with the codebase? That's just generic how to init a react app that any gpt can do without knowing anything about the codebase.
Thanks brother
Could you do a video about Cursor? So far to me seems very useful and better than copilot
what is differrence form this and copilot
@@parasetamol6261 easier and more integrated compared to tabnine/copilot
Can you list system specs, rather than just stating that you used a GPU?
I have M2 Max 96GB
Can u share the codelab code ?
Yes? Will put together it in a colab and share
@@engineerprompt where can i find your github repo Sir?
Again very useful video, Thank you very much!!
I'm new in this area and it's not clear .... I have a concern about embedding function/model - if you use OpenAIEmbeddings should I worry about some privacy concerns in cases involving sensitive data.
I have already tried with open source model embedding like HuggingFaceInstructEmbeddings
model_name = "hkunlp/instructor-large"
but when I try to load it into Chroma gives me the following error message: chromadb.errors.InvalidDimensionException: Embedding dimension 384 does not match collection dimensionality 1536
Yes, you are sharing your data with openai.
I regards to that error, simply delete your vector store and then rerun the embedding computation, this should work. Basically you have existing vector store where embeddings have different dimensions. You want to recompute everything from scratch
It works - Thank you a lot again!! You save me from headache!!@@engineerprompt
Nice vid, but you can't just use any generic embeddings model, you'd need something that works well for code and that's tough. OpenAI is probably one of the best still. Also anyone who uses GPT4 for code and tries 13B CodeLlama, or 34B even, is going to be sad. It's as bad as GPT3.5, mostly worse. Kind of useless for any higher level reasoning. Also if you're using it for Python why not use the Python-specific CodeLlama model that was optimized exactly for Python use?
Why are you saying "GPU" in relation to M2? It's arm CPU and llama.cpp project just uses very bespoke optimizations for that CPU, but not the GPU
M2 has integrated GPU.
@@erikjohnson9112 and the whole thing is having power supply and LCD screen. But what does it have to do with inference?
Looks like if using M series, there is no cuda, so you can't really use gpu. I am not really sure what the author is using, but I don't think Mac can use GPU here.
My comments get deleted not sure what is happing
Add collaboration file please
I actually wanted to talk to my code to ask why it's so bad.
hello
docs = retriever.get_relevant_documents(question), not sure what is this "retriever". getting NameError: name 'retriever' is not defined
had to include previous video code to make it work. Thanks
This is very insightful tutorial I have applied this but I have doubt how you have created retriever object for Llama?
retriever.get_relevant_documents(questions)