Better RAG: Hybrid Search in Chat with Documents | BM25 and Ensemble

Prompt Engineering

มุมมอง 20 388

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ก.ค. 2024
Learn Advanced RAG concepts to talk your chat with documents to the next level with Hybrid Search. We will look at BM25 algorithm along with ensemble retriever. The implementation will be in langchain.
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
LINKS:
How to properly chunk documents: • LangChain: How to Prop...
Chat with PDF: • Custom GUI App for Ope...
Chat with PDF file: • ChatGPT for YOUR OWN P...
Google Collab: tinyurl.com/33wc8sav
TIMESTAMPS:
[00:00] Introduction to Advanced RAG Pipelines
[00:11] Understanding the Basics of RAG Pipelines
[01:49] Improving RAG Pipelines with Hybrid Search
[02:55] Code Example: Implementing Hybrid Search
[05:08] Loading and Processing the PDF File
[06:24] Creating Embeddings and Vector Store
[08:46] Setting Up the Retrievers
[12:52] Running the Model and Analyzing the Output
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 61

@engineerprompt หลายเดือนก่อน
If you are interested in learning more about how to build robust RAG applications, check out this course: prompt-s-site.thinkific.com/courses/rag
@awakenwithoutcoffee หลายเดือนก่อน
Hi there, Personally I find the price too steep for only 2 hours of content but maybe you can convince us with a preview ! Cheers
@TomanswerAi 5 หลายเดือนก่อน ⁺²
Excellent video I’ve been needing this. Very slick way to combine the responses from semantic and keyword search.
@poloceccati 5 หลายเดือนก่อน ⁺¹
Very nice idea with this 'code display window' in your video:
now the code is much easier to read, and much easier to follow step by step. Thanks.
@paulmiller591 5 หลายเดือนก่อน ⁺³
Fantastic Video and very timely. Thanks for the advice. I have made some massive progress because of it.
@engineerprompt 5 หลายเดือนก่อน
Glad it was helpful and thank you for your support 🙏
@MikewasG 5 หลายเดือนก่อน ⁺⁴
This video is really helpful to me!Thanks a lot!
@engineerprompt 5 หลายเดือนก่อน
Thanks 😊
@andaldana 5 หลายเดือนก่อน
Great stuff! Thanks!
@micbab-vg2mu 5 หลายเดือนก่อน ⁺¹
Thank you for the video:)
@aneerpa8384 5 หลายเดือนก่อน
Really helpful, thank you ❤
@KOTAGIRISIVAKUMAR 5 หลายเดือนก่อน
Great effort and good content..😇😇
@12351624 5 หลายเดือนก่อน ⁺¹
Amazing video , thanks
@engineerprompt 5 หลายเดือนก่อน
🙏
@TheZEN2011 5 หลายเดือนก่อน
I'll have to try this one. Great video!
@engineerprompt 5 หลายเดือนก่อน
Glad it was helpful
@quickcinemarecap 5 หลายเดือนก่อน ⁺²
00:01 Introduction to Advanced RAG series
02:06 Hybrid search combines semantic and keyword-based search
04:07 Setting up the necessary components for hybrid search in chat with documents
06:13 Creating and using API token in Google Colab
08:19 Creating Vector store and retrievers for hybrid search in chat with documents
10:16 Using different retrievers for different types of documents
12:11 Creating a prompt chat template for the model.
14:12 Comparison of Orca and CH GPT
@impushprajyadav 5 หลายเดือนก่อน
Thanks man 🚩
@deixis6979 5 หลายเดือนก่อน
hello! thanks for the video. I was wondering if we can use it on csv files instead of PDF? How would that affect the architecture?
@attilavass6935 5 หลายเดือนก่อน ⁺²
It's great that the example code uses free LLM inference like Hugging Face (or OpenRouter)!
@morespinach9832 5 หลายเดือนก่อน
But can we host them locally? Working in an industry that can’t use public SaaS stuff.
@saqqara6361 5 หลายเดือนก่อน ⁺⁴
Great - while you can persist the chromadb, is there a way to persist der bm25retriever? or do you have to chunk always again when starting the application?
@vikaskyatannawar8417 หลายเดือนก่อน
You can fetch documents from DB and feed it.
@kenchang3456 4 หลายเดือนก่อน
Excellent video, it's helping me with my proof of concept. Thank you.
@engineerprompt 4 หลายเดือนก่อน ⁺¹
Glad to hear that!
@kenchang3456 2 หลายเดือนก่อน
@@engineerprompt I finaly got my POC up and running to search for parts and materials using hybrid search and it works really well. Thanks for do this video.
@engineerprompt 2 หลายเดือนก่อน ⁺¹
@@kenchang3456 this is great news.
@SRV900 4 วันที่ผ่านมา
Hello! First of all, thank you very much for the video! Secondly, at minute 10:20 you mention that you are going to create a new video about obtaining the metadata of the chunks. Do you have that video? Again, thank you very much for the material.
@user-cq7iu4ws6q 4 หลายเดือนก่อน
Thanks! I have 500k documents. I want to compute the keyword retriever once and call it the same way I have external index for dense DB vector. Is there a way?
@lakshay510 5 หลายเดือนก่อน ⁺³
Hey, These videos are really helpful. What do you think about scalability? When the document size increases from few to 1000s, the performance of semantic search decreases. Also have you tried qdrant? It worked better than chroma for me.
@engineerprompt 5 หลายเดือนก่อน ⁺²
Scalability is potentially an issue. Will be making some content around it. In theory, the retrieval speed will decrease as the number of documents increases by order of magnitude. But in that case, finding approximate neighbors will work. Haven't looked at qdrant yet but it's on my list. Thanks for sharing
@zYokiS 5 หลายเดือนก่อน ⁺¹
Amazing video! How can you use this in a conversational chat engine? I have built conversational pipelines that use RAG, however how would I do this here while having different retrievers?
@engineerprompt 5 หลายเดือนก่อน ⁺¹
This should work out of the box, you will need to replace your current retriever with the ensemble one.
@mrchongnoi 5 หลายเดือนก่อน ⁺⁴
How do you handle multiple documents that are unrelated to find the answer for the user ?
@parikshitrathode4578 5 หลายเดือนก่อน
I have the same question, how do we handle multiple documents of similar types, let's say office policies for different companies?
The similarity search will return all similar chunks (k=5) as context to LLM, which may contain different answers based on the company's policy. There is lot of ambiguity here.
Also how do we handle tables in PDFs as when asked questions they don't provide correct answer for it.
Can anyone help me out here?
@texasfossilguy 5 หลายเดือนก่อน ⁺¹
One way would be to have an agent select a specific database based on the query, or have a variable for the user stating which company they work for. You would then have multiple databases, one for each company involved, or whatever..
This would also keep the databases smaller.
Handling that in some way like that would speed up the search and response.
@chrismathew638 5 หลายเดือนก่อน
I'm using RAG for a coding model. can anyone suggest a good retriever for this task?. Thanks in advance!
@JanghyunBaek 4 หลายเดือนก่อน
@engineerprompt - Could you convert Notebook with LlamaIndex if you don't mind?
@karanv293 5 หลายเดือนก่อน
i dont know what RAG to implement . is there benchmarks out there for the best solution? My use case will be 100s of LONG documents even textbooks.
@rafaf6838 5 หลายเดือนก่อน ⁺¹
Thank you for sharing the guide. One question, how to make the response longer, I have tried to change the max_length parameter, as you suggested in the video, but the response is always some ~ 300 characters long.
@linuxmanju 5 หลายเดือนก่อน ⁺¹
It depends on the model too. May be your llm model doesn't support more than 300!? . Which model you are using btw ?
@engineerprompt 5 หลายเดือนก่อน ⁺¹
Which model are you trying? How long is your context?
@sarcastic.affirmations 5 หลายเดือนก่อน ⁺²
@@engineerprompt I've experienced a similar issue, I'm using the zephyr-7b-beta model. Also, I don't want the AI to get the answers from the internet, and just give response if the context is available in the database provided. I tried to use the prompting for that, didn't help. Any tips?
@PallaviChauhan91 5 หลายเดือนก่อน
@@sarcastic.affirmations did you get what you were trying to find?
@denb1568 5 หลายเดือนก่อน
Can you add this functionality to localGPT?
@PallaviChauhan91 5 หลายเดือนก่อน
Hi, I have a question, hope you reply. If we want to give it a PDF with bunch of video transcripts and ask it to formulate a creative article based on the info given, can it actually do the tasks like that? Or is it just useful for finding relevant information from the source files?
@engineerprompt 5 หลายเดือนก่อน ⁺¹
RAG is good for finding the relevant information. For the use case you are describing, you will need to add everything in the context window of the LLM in order for it to look at the whole file. Hope this helps.
@PallaviChauhan91 5 หลายเดือนก่อน
@@engineerprompt Can you point me out a good video/ channel who focuses on accomplishing such things using local LLMs or even chatGpt4 ?
@hassentangier3891 3 หลายเดือนก่อน
Great,do you have videos for using docx files
@engineerprompt 3 หลายเดือนก่อน
thanks, same will work but you will need to use a separate loader for it. Look into unstructured.io.
@abhinandansharma3983 3 หลายเดือนก่อน
"Where can I find the PDF data?"
@engineerprompt 3 หลายเดือนก่อน
You will need to provide your own PDF files.
@googleyoutubechannel8554 5 หลายเดือนก่อน ⁺²
Wait, this doesn't seem like RAG at all? If I'm following, the LLM is not using embedding vectors at all in the actual llm inference step? It seems you're using a complex text->embedding->search engine step as a way to build a text search engine that just injects regular text into the context, but does not use embeddings directly added to the model? Couldn't you generate extra 'ad-hoc' search text you're just plopping into the context window in any number of methods, only one of them being using embeddings -> db -> text? And this method has none of the advantage of actually 'grafting on' embeddings directly to the model as you're using up the context window?
@s11-informationatyourservi44 5 หลายเดือนก่อน
the whole point is to fix the broken part of RAG. the typical rag implementation doesn’t do too well with anything larger than a few docs
@vamshi3676 5 หลายเดือนก่อน ⁺¹
The background is little distracting, its better to avoid the flashy one, i couldn't concentrate on your lecture. Please. Thank you.
@clinton2312 5 หลายเดือนก่อน ⁺²
I get KeyError 0 when I run this
# Vector store with the selected embedding model
vectorstore = Chroma.from_documents(chunks, embeddings)
What am I doing wrong? I added my HF token with read the first time and then with write too...
I would appreciate the help.
Thanks for the video, though. Its amazing.
@goel323 4 หลายเดือนก่อน
I am getting same error
@andaldana 5 หลายเดือนก่อน
Great stuff! Thanks!

ต่อไป

เล่นอัตโนมัติ

How to Set the Chunk Size in Document Splitter | RAG | LangChain