Semi-structured RAG - LangChain using Mistral 7B , Qdrant FastEmbed on pdf text with tabular data
ฝัง
- เผยแพร่เมื่อ 14 ต.ค. 2024
- If you like to support me financially, It is totally optional and voluntary. Buy me a coffee here: www.buymeacoff...
Many documents contain a mixture of content types, including text and tables.
Semi-structured data can be challenging for conventional RAG for at least two reasons:
• Text splitting may break up tables, corrupting the data in retrieval
• Embedding tables may pose challenges for semantic similarity search
This video shows how to perform RAG on documents with semi-structured data:
• We will use Unstructured to parse both text and tables from documents (PDFs).
• We will use the multi-vector retriever to store raw tables, text along with table summaries better suited for retrieval.
• We will use LCEL to implement the chains used.
We will use Mistral 7B Instruct as our LLM and use Qdrant FastEmbed for our embedding
Colab notebook:
colab.research...
github.com/lan...
huggingface.co...
qdrant.github....
unstructured.io/
Previous video on semi-structured RAG with OpenAI GPT-4: • Semi-structured RAG wi...
If you like such content please subscribe to the channel here:
www.youtube.co...
Could you share the PDF file you have worked on here ?
Can you go in detail how extracted text and table looks like? especially table after extracting and before making summaries of table.
Thanks
Please debug the colab. Link is shared in the description of the video
hi sir, can i do this same in amazon sagemaker,or in amazon bedrcok
You should be able to do it
Sir, Is this done on paid colab? How can I do this in unpaid colab with cpu? Is it even possible?
It should be possible if you use a quantized model. There are other libraries like ollama where you can run it locally on CPU
Table,Text Can we add images data too here?
I think if unstructured can handle images, it should work
@@RitheshSreenivasan This video was very informative, could you also try airoboros-13B model someday because it seems to perform better than other open source LLM models or maybe give a shot experimenting with falcon LLM, thanks
Yes we can experiment with different models