Semi-structured RAG - LangChain using Mistral 7B , Qdrant FastEmbed on pdf text with tabular data

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 ต.ค. 2024
  • If you like to support me financially, It is totally optional and voluntary. Buy me a coffee here: www.buymeacoff...
    Many documents contain a mixture of content types, including text and tables.
    Semi-structured data can be challenging for conventional RAG for at least two reasons:
    • Text splitting may break up tables, corrupting the data in retrieval
    • Embedding tables may pose challenges for semantic similarity search
    This video shows how to perform RAG on documents with semi-structured data:
    • We will use Unstructured to parse both text and tables from documents (PDFs).
    • We will use the multi-vector retriever to store raw tables, text along with table summaries better suited for retrieval.
    • We will use LCEL to implement the chains used.
    We will use Mistral 7B Instruct as our LLM and use Qdrant FastEmbed for our embedding
    Colab notebook:
    colab.research...
    github.com/lan...
    huggingface.co...
    qdrant.github....
    unstructured.io/
    Previous video on semi-structured RAG with OpenAI GPT-4: • Semi-structured RAG wi...
    If you like such content please subscribe to the channel here:
    www.youtube.co...

ความคิดเห็น • 11

  • @GAAD_Anoop_R
    @GAAD_Anoop_R หลายเดือนก่อน

    Could you share the PDF file you have worked on here ?

  • @sagarchadha98
    @sagarchadha98 5 หลายเดือนก่อน

    Can you go in detail how extracted text and table looks like? especially table after extracting and before making summaries of table.
    Thanks

    • @RitheshSreenivasan
      @RitheshSreenivasan  5 หลายเดือนก่อน

      Please debug the colab. Link is shared in the description of the video

  • @techthunder4832
    @techthunder4832 3 หลายเดือนก่อน

    hi sir, can i do this same in amazon sagemaker,or in amazon bedrcok

  • @rnronie38
    @rnronie38 5 หลายเดือนก่อน

    Sir, Is this done on paid colab? How can I do this in unpaid colab with cpu? Is it even possible?

    • @RitheshSreenivasan
      @RitheshSreenivasan  5 หลายเดือนก่อน

      It should be possible if you use a quantized model. There are other libraries like ollama where you can run it locally on CPU

  • @devanshgupta6064
    @devanshgupta6064 8 หลายเดือนก่อน

    Table,Text Can we add images data too here?

    • @RitheshSreenivasan
      @RitheshSreenivasan  8 หลายเดือนก่อน

      I think if unstructured can handle images, it should work

    • @devanshgupta6064
      @devanshgupta6064 8 หลายเดือนก่อน

      @@RitheshSreenivasan This video was very informative, could you also try airoboros-13B model someday because it seems to perform better than other open source LLM models or maybe give a shot experimenting with falcon LLM, thanks

    • @RitheshSreenivasan
      @RitheshSreenivasan  8 หลายเดือนก่อน +1

      Yes we can experiment with different models