LangChain - Parent-Document Retriever Deepdive with Custom PgVector Store

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ค. 2024
  • In this video we gonna make a Deepdive into Parent-Document Retriever. We not only use the langchain docstore, but we will also create our own custom docstore. This is quite an advanced video and probably the advanced one you will about this topic on TH-cam
    Code: github.com/Coding-Crashkurse/...
    Timestamps:
    0:00 Introduction into Parent-Document Retriever
    1:55 PD-Retriever with InMemory Store
    6:37 PD-Retriever with Postgres based Store

ความคิดเห็น • 23

  • @jarekmor
    @jarekmor 2 วันที่ผ่านมา

    hi! I like your videos and I learned a lot from it. Your approach is realy production-ready and I am implementing some of your ideas in my PoC for one of my customers.There will be more stuff in the PoC - MS Sharepoint On-Premise integration, AD and LDAP authorization, Neo4J, Multivectorstore ret., etc. But your ideas was the fundation for my project. Thank you very much and keep going! :-)

  • @varruktalalle4090
    @varruktalalle4090 หลายเดือนก่อน +1

    Can you explain how to reload the pg-parentDocRetriever e.g. first create the retriever as you showed and then reload the retriever in a different script?

  • @maxlgemeinderat9202
    @maxlgemeinderat9202 หลายเดือนก่อน +1

    working exactly on this at the moment! My eval showed that ParentDocumentRetriever works best for my use case.
    What do you think of my idea of implementing a Reranker (e.g. ColBERT) after retrieving the small chunks and the only get the parent chunks of the reranked child chunks? Atm I am trying to implement this but I think I have to change the MultivectorRetriever class in Langchain. Or how would you add this to your solution (e.g. doing reranking with langchain CompressionRetriever)?
    I can't rerank the results in the end as normal, as the ParentChunks probably will be too large for a reranking model with 512 max_tokens

  • @M10n8
    @M10n8 หลายเดือนก่อน +1

    This can be extended nicely over MultiVectorRetriever which nicely pair with 'unstructured' library, so you can make RAG over pdf files which unstructured would extract tables, images and text separately and ask model to make captions from images (base64 passed to openai), make summary from tables and if you like also text, then store that and retrieve using MultiVectorRetriever with PGVector as db ;-) Can I request video? ++

    • @codingcrashcourses8533
      @codingcrashcourses8533  หลายเดือนก่อน +1

      Next Videos will be about LangGraph, but maybe after that :)

  • @thawab85
    @thawab85 หลายเดือนก่อน

    you had a few videos on Raptor, would be great if you can compare the indexing methods and what's the usecases each is recommended for.

  • @AngelWhite007
    @AngelWhite007 หลายเดือนก่อน +1

    Please make a video on creating a sidebar like Chatgpt using ReactJs and Langchain Python

    • @codingcrashcourses8533
      @codingcrashcourses8533  หลายเดือนก่อน +1

      Man this is 90 percent Front end work, you will find better people to build this

  • @Emmit-hv5pw
    @Emmit-hv5pw หลายเดือนก่อน

    Thanks !! Any plans of a tutorial on a custom agents with memory having custom tools to retrieve information from a SQL DB, vector store (pdf) and tool calling (real time info) with eval on LangSmith in a real business case environment.

    • @codingcrashcourses8533
      @codingcrashcourses8533  หลายเดือนก่อน

      Probably too difficult for a tutorial to do all that stuff at once. Maybe an easier usecase with RAG and Memory.

  • @angelmoreno3383
    @angelmoreno3383 7 วันที่ผ่านมา

    That is a really interesting implementation! I wonder if this could help reducing time on the retriever.add_documents operation, as I'm trying to do a RAG with around 100 pdfs and when testing ParentDocument retriever this is delaying too much. Do you know any solution for this?

    • @codingcrashcourses8533
      @codingcrashcourses8533  7 วันที่ผ่านมา

      Hm, how do you preprocess your pdfs? How many chunks do you have at the end?

    • @angelmoreno3383
      @angelmoreno3383 7 วันที่ผ่านมา

      @@codingcrashcourses8533 On my vectorstore they are splitted on 800 chunk size. On my store im loading them using PyPDF loaders and a kv docstore

    • @angelmoreno3383
      @angelmoreno3383 7 วันที่ผ่านมา

      @@codingcrashcourses8533 im using PyPDF loader and then storing them on a LocalFileStore using create_kv_docstore. At the end my docstore has around 350 chunks

  • @yazanrisheh5127
    @yazanrisheh5127 หลายเดือนก่อน +1

    First

  • @andreypetrunin5702
    @andreypetrunin5702 หลายเดือนก่อน

    Markus, hi. Can you give me the code to this video? I want to convert it to the Xata database.

    • @codingcrashcourses8533
      @codingcrashcourses8533  หลายเดือนก่อน

      I added the notebook

    • @andreypetrunin5702
      @andreypetrunin5702 หลายเดือนก่อน

      @@codingcrashcourses8533 Спасибо!!!!

    • @andreypetrunin5702
      @andreypetrunin5702 12 วันที่ผ่านมา

      @@codingcrashcourses8533 The code only creates and saves the database, but how do I load it when I reuse it? If I didn't see it, I apologize.

    • @codingcrashcourses8533
      @codingcrashcourses8533  12 วันที่ผ่านมา

      @@andreypetrunin5702 You don´t have to "reload" it when you use PgVector, this service is permanently running inside a container. The "get_relevant_documents" method already uses it

    • @andreypetrunin5702
      @andreypetrunin5702 12 วันที่ผ่านมา

      @@codingcrashcourses8533 confused with the local FAISS and Croma databases. ))))