RAG for LLMs explained in 3 minutes

แชร์
ฝัง
  • เผยแพร่เมื่อ 13 พ.ค. 2024
  • How I Explain Retrieval Augmented Generation (RAG) to Business Managers
    (in 3 Minutes)
    Large language models have been a huge hit for personal and consumer use cases. But what happens when you bring them into your business or use them for enterprise purposes? Well, you encounter a few challenges. The most significant one is the lack of domain expertise.
    Remember, these large language models are trained on publicly available datasets. This means they might not possess the detailed knowledge specific to your domain or niche. Moreover, the training data won't include your Standard Operating Procedures (SOPs), records, intellectual property (IP), guidelines, or other relevant content. So, if you're considering using AI assistants "out of the box," they're going to lack much of that context, rendering them nearly useless for your specific business needs.
    However, there's a solution that's becoming quite popular and has proven to be robust: RAG, or Retrieval Augmented Generation. In this approach, we add an extra step before a prompt is sent to an AI assistant. This step involves searching through a corpus of your own data-be it documents, PDFs, or transactions-to find information relevant to the user's prompt.
    The information found is then added to the prompt that goes into the AI assistant, which subsequently returns the answer to the user. It turns out this is an incredibly effective way to add context for an AI assistant. Doing so also helps reduce hallucinations, which is another major concern.
    Hope you find this overview helpful. Have any questions or comments? Please drop them below.
    If you're a AI practitioner and believe I've overlooked something or wish to contribute to the discussion, feel free to share your insights. Many people will be watching this, and your input could greatly benefit others.
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 12

  • @antoineroyer3841
    @antoineroyer3841 วันที่ผ่านมา +1

    Clear thanks

  • @adipai
    @adipai หลายเดือนก่อน +2

    thank you for the video George Santos :)

  • @farexBaby-ur8ns
    @farexBaby-ur8ns 20 วันที่ผ่านมา +2

    Very Nice. However an example would’ve helped augment the answer. Like ask it the gdp of Chad in 2023 when using ChatGPT.

    • @MannyBernabe
      @MannyBernabe  15 วันที่ผ่านมา

      Agree. Thanks for feedback. 😊

  • @jasondsouza3555
    @jasondsouza3555 2 หลายเดือนก่อน +1

    Just wanted to clear my confusion, would i yield better results by applying RAG to a fine-tuned model (i.e. fine-tuned in my field of work) or is RAG on a stock LLM good enough?

    • @MannyBernabe
      @MannyBernabe  2 หลายเดือนก่อน +3

      Hey Jason, the current best practice is to first try RAG with a stock LLM and see if that works. If not, then consider fine-tuning, because it requires more effort than RAG. Hope that helps.

  • @DanielBoueiz
    @DanielBoueiz 24 วันที่ผ่านมา +1

    Does the LLM first defaults to check the additional datastore we gave it to see if it has any relevant data related to the prompt the user enters, and if it finds relevant data, it responds to the user without checking the original data on which it has been trained, and if it doesnt find any relevant data in the datastore to the prompt, will then act as if RAG wasnt even implemented, and will respond based on the data on which it has been originally trained, or am i getting it wrong?

    • @MannyBernabe
      @MannyBernabe  23 วันที่ผ่านมา +1

      You got it. First will ping the corpus for relevant data, retrieve and insert into prompt.
      If none, then you just get the standard LLM output.
      Hope that helps.

  • @victormustin2547
    @victormustin2547 2 หลายเดือนก่อน

    So does that mean that the data needs to fit the llm context window ? Or is the data going through some sort of compression ?

    • @MannyBernabe
      @MannyBernabe  2 หลายเดือนก่อน

      Correct. The retrieved context still needs to fit into the context window with the original prompt. In terms of compression, we can summarize the retrieved context, saving space as well. Hope that helps.