Enhancing RAG Pipelines with Metadata Enrichment Techniques

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ก.ย. 2024
  • Discover the untapped potential of metadata in transforming your Retrieval-Augmented Generation (RAG) pipeline in my latest tutorial. Join me as I explore the crucial role of metadata in enriching knowledge bases and enhancing accuracy in data processing.
    In this video, I delve into an effective orchestration strategy using Hugging Face's open-source models. I address the common pitfalls of handling unstructured data and dispel the myth that simple plug-and-play methods are sufficient for complex document management.
    This method combines similarity search with the strategic use of pre-processed keywords stored as document metadata, ensuring a more refined and accurate retrieval process.
    Whether it's a single document or a vast vector database, this tutorial provides essential insights and techniques to elevate your RAG pipeline.
    👉 If you found this tutorial helpful, don't forget to hit the Like button! It really supports my work and helps others find this content.
    🔔 Subscribe for more insightful tutorials and tips on data processing and machine learning models.
    💬 Have thoughts or questions? Comment below! I love hearing your feedback and engaging with your ideas.
    Thank you for watching, and stay tuned for more content!
    GitHub Repo: github.com/AIA...
    KeyBERT Here: github.com/Maa...
    Embeddings Model: huggingface.co...
    Join this channel to get access to perks:
    / @aianytime
    #rag #langchain #generativeai

ความคิดเห็น • 37

  • @ShaunPrince
    @ShaunPrince 9 หลายเดือนก่อน +6

    I love your work, and how you explain every step. The best part of these videos is watching you write from scratch and speak your thoughts out loud. Great channel, highly underrated.

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน +1

      Wow, thank you!

  • @shameekm2146
    @shameekm2146 9 หลายเดือนก่อน +1

    Thank you so much bro. In my current sprint i got assigned with task of performing Topic Modelling using n-gram approach for enhancing the RAG. Because the documents that i am getting to work with is of very low quality. And since the tool is already in end user validation phase, I am receiving bad feedbacks. This is where i started to think of applying this technique. And you helped me save a lot of time. Thank you once again :)

    • @akki_the_tecki
      @akki_the_tecki 9 หลายเดือนก่อน

      hi, Shameek. whats the actual pipeline your using for that, Iam also suffering with same problem, There is a video named Investment-Banker-Chatbot (Chat with PDFs), Actually I copied the all code from that. Is that sufficient for 100s of PDFs.

    • @shameekm2146
      @shameekm2146 9 หลายเดือนก่อน

      Hi @@akki_the_tecki , Currently the pipeline i am using has BGE base for creation of embeddings. Then i have used Llama-2 13B in 4 bit quantized mode to generate answer from the context passed via BGE. This pipeline works well when document quality is good. Also when there is no overlap of information in multiple documents. But when the information is maintained across a single document in multiple paragraphs, the pipeline fails badly. Hence i am trying to implement the above approach and check

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน +1

      All the best

  • @Andromeda26_
    @Andromeda26_ 9 หลายเดือนก่อน

    Thank you! I have been waiting for something like keybert for months. The mixture of Llamaindex and Keybert will be dope. Never underestimate the power of Meata Data when you deal with millions and billions of chunks of data. Thanks again!

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน +1

      Hope you enjoy it!

  • @ilaydelrey3122
    @ilaydelrey3122 9 หลายเดือนก่อน

    Please keep up this good work!

  • @user-iu4id3eh1x
    @user-iu4id3eh1x 9 หลายเดือนก่อน

    Wow really impressive technique 👏

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน

      Thank you! Cheers!

  • @forexhunter2040
    @forexhunter2040 7 หลายเดือนก่อน

    How can one return meta data e.g. page, title, source document whenever querrying with LLM?

  • @shivamroy1775
    @shivamroy1775 9 หลายเดือนก่อน

    Great video again. Very informative

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน

      Thanks

  • @sneharoy3566
    @sneharoy3566 9 หลายเดือนก่อน

    Great detailed information 👏

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน

      Glad it was helpful!

  • @riyaz8072
    @riyaz8072 9 หลายเดือนก่อน

    Can you also compare the results with one of your old methods whenever you do improvement videos like this please ?
    Thanks ! Love your work.

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน +1

      Will do it and push it on GitHub repo.

  • @mcmarvin7843
    @mcmarvin7843 9 หลายเดือนก่อน

    Keep up the good work ❤

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน

      Thanks, you too!

  • @akki_the_tecki
    @akki_the_tecki 9 หลายเดือนก่อน

    Worthyyy 38 min 🔥, But Sonu tell me this, Can I use both Embedding models like BGE and keyBert.
    For example Iam preparing a project based on your Investment-Banker-Chatbot (Chat with PDFs), So for that, Can I use this KeyBert? Just explain for that in the VS Code.

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน

      Yes, definitely

  • @jrfcs18
    @jrfcs18 9 หลายเดือนก่อน

    Great video. I would appreciate a video on how to use the metadata keyword search technique to feed the resulting related content to the LLM. Thanks for all you do.

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน

      That's part of the upcoming video.

  • @kyunglee1924
    @kyunglee1924 8 หลายเดือนก่อน

    part2 out?

  • @kingfunny4821
    @kingfunny4821 9 หลายเดือนก่อน

    Has a video been explained before on how to create a chat bot with private documents without using the Internet?

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน +1

      Look at RAG playlist. There are 20+

    • @kingfunny4821
      @kingfunny4821 9 หลายเดือนก่อน

      Thanks

    • @kingfunny4821
      @kingfunny4821 9 หลายเดือนก่อน

      Thanks

  • @manishsharma2211
    @manishsharma2211 9 หลายเดือนก่อน

    There was a spelling error. You wrote tdqm

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน

      Oh yes. Thanks for catching that. Hope you liked the video

  • @yvonnewebster8439
    @yvonnewebster8439 9 หลายเดือนก่อน

    is this free?

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน

      Absolutely. All open source.

    • @yvonnewebster8439
      @yvonnewebster8439 9 หลายเดือนก่อน

      Thanks for your reply, also I wanted to know. That for storing vector embeddings of around 500 - 600 million vecs, do opensource vector databases allow to store them for free? Opensource vector databases like Milvus, Qdrant, Weaviate, etc. Any suggestions on how to store embeddings on vector databases for effective RAG retreival? This topic deserves a separate video.@@AIAnytime

  • @MachineLearningZuu
    @MachineLearningZuu 9 หลายเดือนก่อน

    Yes. Thank you❤ I requested this🥹❤️

    • @AIAnytime
      @AIAnytime  9 หลายเดือนก่อน

      Hope you like it!