Do check out my other videos of building the RAG app using Ollama. Ollama RAG using LangChain -> th-cam.com/video/aUBFTDLXGE0/w-d-xo.html Ollama RAG -> th-cam.com/video/et_EREAsIQE/w-d-xo.html Learn Ollama -> th-cam.com/video/vfm_Pmxe-z8/w-d-xo.html
Great work, my only issue with the whole process screen scraping or reading a local document is that you do still need the internet for this part of the code [from llama_index.core import Settings Settings.llm = Ollama(model="llama2", request_timeout=600) Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5" ) index = VectorStoreIndex.from_documents(docs) query_engine = index.as_query_engine() # similarity_top_k, streaming=True query_engine] For some reason, this does not get saved and requires an internet connection for the run. Have you had any luck installing bge-small-en-v1.5 locally?
To be a little more clear on step 4 4. Build Complete RAG Pipeline - requires huggingface connection. My goal it to be completely sandboxed so the embedded model is local as well. There are some instructions on hugging face and none of them seem to work with this approach so I would find it very interesting if you wanted to experiment and see if you can find a way around it.
Thanks for your comment. So as far as I understand, you want total RAG app with local LLMs only. In my current tutorial, Huggingface requires connection. Actually, when I created this tutorial, LlamaIndex did not have a way to use "Ollama" model for generating embeddings hence I had to use huggingface. But recently, they have introduced embedding creation using "Ollama" as well. Here's a link to it. * docs.llamaindex.ai/en/stable/examples/embeddings/ollama_embedding/ You can replace code that uses huggingface for generating embeddings with Ollama. Other than this, I have another tutorial on building RAG app using "LangChain" where I have used "Ollama" for answering queries and generating embeddings. No huggingface. Feel free to check it if that interests you. * th-cam.com/video/aUBFTDLXGE0/w-d-xo.html
@@CoderzColumn Very exciting developments. I really appreciate the value you've added to the community by demonstrating a complicated process so straightforwardly!!!
Thank you for posting such a great video! You have a done an excellent job explaining the steps necessary to build a RAG with LlamaIndex. I do have one question. When I set the top_k to a large number to retrieve more files, the retriever retrieves duplicate files. retriever = index.as_retriever(similarity_top_k=50) For example, if I only fetch 1 link, the retriever stores 7 docs... all of the docs are identical. Would you have any suggestions for removing fetcher duplicates? Thank you!
Thanks for your message. The retriever generally divides single docs into multiple Nodes. There can be overlapping content between docs. There are ways to handle how docs should be handled whether to store as it is or divide in paragraphs and store them. The "from_document()" function has parameter named "transformations" where info about how to split docs is provided. Feel free to check below link from llamaindex docs where it is explained. * docs.llamaindex.ai/en/stable/understanding/loading/loading/ You can try some different loader which is not creating overlapping docs. Check whether node_ids of all docs are same or not. It might be overlapping content.
I would suggest increasing request_timeout parameter value. I feel that model is taking more than 600 seconds on your system to generate response. That's why you might be getting connection time out error. Try to set that parameter value to 1000-1200 and see if that works.
Do check out my other videos of building the RAG app using Ollama.
Ollama RAG using LangChain -> th-cam.com/video/aUBFTDLXGE0/w-d-xo.html
Ollama RAG -> th-cam.com/video/et_EREAsIQE/w-d-xo.html
Learn Ollama -> th-cam.com/video/vfm_Pmxe-z8/w-d-xo.html
Great work, my only issue with the whole process screen scraping or reading a local document is that you do still need the internet for this part of the code [from llama_index.core import Settings
Settings.llm = Ollama(model="llama2", request_timeout=600)
Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5"
)
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine() # similarity_top_k, streaming=True
query_engine]
For some reason, this does not get saved and requires an internet connection for the run. Have you had any luck installing bge-small-en-v1.5 locally?
To be a little more clear on step 4 4. Build Complete RAG Pipeline - requires huggingface connection. My goal it to be completely sandboxed so the embedded model is local as well. There are some instructions on hugging face and none of them seem to work with this approach so I would find it very interesting if you wanted to experiment and see if you can find a way around it.
Thanks for your comment.
So as far as I understand, you want total RAG app with local LLMs only. In my current tutorial, Huggingface requires connection.
Actually, when I created this tutorial, LlamaIndex did not have a way to use "Ollama" model for generating embeddings hence I had to use huggingface. But recently, they have introduced embedding creation using "Ollama" as well. Here's a link to it.
* docs.llamaindex.ai/en/stable/examples/embeddings/ollama_embedding/
You can replace code that uses huggingface for generating embeddings with Ollama.
Other than this, I have another tutorial on building RAG app using "LangChain" where I have used "Ollama" for answering queries and generating embeddings. No huggingface. Feel free to check it if that interests you.
* th-cam.com/video/aUBFTDLXGE0/w-d-xo.html
@@CoderzColumn Very exciting developments. I really appreciate the value you've added to the community by demonstrating a complicated process so straightforwardly!!!
Thanks for your comment !! Really appreciate it !!!
Thank you for sharing the step by step guide. Really appreciate your efforts.
Glad it was helpful! Thanks for taking time to comment !!!
There is a lot of content around on this topic but this is one of the useful ones.
Thanks for the feedback. Really appreciate it.
Thank you for posting such a great video! You have a done an excellent job explaining the steps necessary to build a RAG with LlamaIndex.
I do have one question. When I set the top_k to a large number to retrieve more files, the retriever retrieves duplicate files. retriever = index.as_retriever(similarity_top_k=50)
For example, if I only fetch 1 link, the retriever stores 7 docs... all of the docs are identical. Would you have any suggestions for removing fetcher duplicates?
Thank you!
Thanks for your message. The retriever generally divides single docs into multiple Nodes. There can be overlapping content between docs. There are ways to handle how docs should be handled whether to store as it is or divide in paragraphs and store them. The "from_document()" function has parameter named "transformations" where info about how to split docs is provided. Feel free to check below link from llamaindex docs where it is explained.
* docs.llamaindex.ai/en/stable/understanding/loading/loading/
You can try some different loader which is not creating overlapping docs. Check whether node_ids of all docs are same or not. It might be overlapping content.
Hi...I am getting connectiontimeout error at the last line when I ask question using query_engine.query()..
Can you help me with this?
I would suggest increasing request_timeout parameter value. I feel that model is taking more than 600 seconds on your system to generate response. That's why you might be getting connection time out error. Try to set that parameter value to 1000-1200 and see if that works.
Thank you for guidance...It worked