Advanced RAG 04 - Contextual Compressors & Filters

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ก.ค. 2024
  • Colab: drp.li/szHxK
    My Links:
    Twitter - / sam_witteveen
    Linkedin - / samwitteveen
    Github:
    github.com/samwit/langchain-t... (updated)
    github.com/samwit/llm-tutorials
    00:00 Intro
    00:50 Contextual Compressions and Filters Diagram
    01:55 LLM Extractor Diagram
    02:21 LLM Extractor Email Fine Tuned Model Diagram
    03:07 LLM Chain Filter Diagram
    03:21 Pipeline Diagram
    04:11 Code Time
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 33

  • @shivamroy1775
    @shivamroy1775 9 หลายเดือนก่อน +1

    Great content. Thanks for taking the time to make such videos. I've been learning a lot from them.

  • @MasterBrain182
    @MasterBrain182 9 หลายเดือนก่อน +3

    Astonishing content Man Sam 💯💯 Thanks to share your knowledge with us (thanks for the subtitles too 😄) Thumbs Up from Brazil 👍👍👍

  • @TheAmit4sun
    @TheAmit4sun 9 หลายเดือนก่อน +6

    I have found Filters that give yes and not to be of not much of a help. for example, i have embedding of tech docs and then embeddings of order processing system. When filter is set and prompt is submitted with a random query like "can i order pizza with it?" model thinks the context is related to order processing and returns YES which is totally wrong.

  • @micbab-vg2mu
    @micbab-vg2mu 9 หลายเดือนก่อน +1

    Thank you for another great video:)

  • @billykotsos4642
    @billykotsos4642 9 หลายเดือนก่อน +3

    this is actually an interesting idea...

  • @shamikbanerjee9965
    @shamikbanerjee9965 9 หลายเดือนก่อน

    Good ideas Sam 👌

  • @henkhbit5748
    @henkhbit5748 8 หลายเดือนก่อน

    Thanks for the video about finetuning RAG. Personally I think the solution of Self-RAG is more generic because its embedded in the LLM...

  • @clray123
    @clray123 9 หลายเดือนก่อน +6

    So in short, in order to make the new revolutionary AI actually useful, you must meticulously hardcode the thinking it is supposed to be doing for you. Feels almost like crafting the expert systems in the 80's! Imagine the expected explosion in productivity from applying that same process! Or let the AI imagine for you (imagination is what it's really good for).

    • @alchemication
      @alchemication 9 หลายเดือนก่อน +1

      Yeah. But in some cases I’ve seen, we don’t need that much sophistication and bare bones approach works well 😊 peace

    • @eugeneware3296
      @eugeneware3296 9 หลายเดือนก่อน +5

      RAG is built on retrieval. And retrieval is another word for search. Search is a very hard problem. The difficulty of searching, ranking, filtering to get a good quality set of candidate documents to reason over is underestimated. That's where the complexity. Vector search doesn't directly solve these issues. Search engines like Google has hundreds of ranking factors, including vector searches, re-ranking cross-encoding models, and quality factors. TL; DR - vector search makes for a good demo and proof of concept. For true production systems, there is a lot of complexity and engineering that's required to make these systems work in practice.

    • @hidroman1993
      @hidroman1993 9 หลายเดือนก่อน +1

      LLMs are not the solution to any problem, as always it's the engineering part that brings the actual results

  • @zd676
    @zd676 9 หลายเดือนก่อน

    First of all, thanks for the great video! As some of the comments have rightfully, while I see some merits for offline use cases, this will be very challenging for real-time use cases. Also, I'm curious how much of a dependency this requires of the chosen LLM to understand and follow the default prompts. It seems the LLM choice and make it or break it, which is quite brittle.

  • @luisjoseve
    @luisjoseve 6 หลายเดือนก่อน

    thanks a lot, keep it up!

  • @theunknown2090
    @theunknown2090 9 หลายเดือนก่อน

    Thabks for the video.

  • @foobars3816
    @foobars3816 8 หลายเดือนก่อน

    13:09 Sounds like you should be using an llm to narrow down that prompt for each case

  • @billykotsos4642
    @billykotsos4642 9 หลายเดือนก่อน +2

    So instead of using an 'Extractive QA model' you prompt an LLM into doing the same thing... amazing how flexible these LLMs are... in this case you are basing your hopes on the models 'reasoning'....

    • @clray123
      @clray123 9 หลายเดือนก่อน +1

      As long as someone else pays for it...

  • @mungojelly
    @mungojelly 9 หลายเดือนก่อน +1

    hm when you were going over those instructions that are like, don't change the text, don't do it, repeat it the same, & it's hard to convince it to write the same text out ,, i thought, like, why make it then? if we just like numbered the sentences then it could just respond w/ the numbers of which sentences to include, or smth, maybe that'd save output tokens as well as not give it any chance to imagine things

  • @RunForPeace-hk1cu
    @RunForPeace-hk1cu 9 หลายเดือนก่อน

    Wouldn't it be simpler if you just use a small chunk_size for the initial splitter function when you embed the documents into the vector database?

  • @moonly3781
    @moonly3781 4 หลายเดือนก่อน

    Thank you for the amazing tutorial! I was wondering, instead of using ChatOpenAi, how can I utilize a llama 2 model locally? Specifically, I couldn't find any implementation, for example, for contextual compression, where you pass compressor = LLMChainExtractor.from_llm(llm) with the ChatOpenAi (llm). How can I achieve this locally with llama 2? My use case involves private documents, so I'm looking for solutions using open-source LLMS.

  • @wiltedblackrose
    @wiltedblackrose 9 หลายเดือนก่อน +3

    This is really interesting. My only worry is that this makes it prohibitively slow. The longest part of RAG is often the call to the LLM. I'd be interesting if you could review some companies which have faster models than OpenAI but still have decent performance.

    • @mungojelly
      @mungojelly 9 หลายเดือนก่อน +2

      if i was making a chatbot & needed it to not lag before responding, i'd just fake it,,, like how windows has twelve different bars go across & various things slowly fade in so it doesn't seem like it's taking forever to boot XD ,, like i'd send the request simultaneously to both the thoughtful process & also a model that just has instructions to respond immediately echoing the user "ok so what you're saying you want is...." personally i'd even want it to be transparent about what's happening, like, say that it's looking stuff up right now, i'd think of feeding the agent that's looking busy for the user some data about how much we've retrieved and how we've processed it so far so it can say computery things like "i have discovered 8475 documents relevant to your query, and i am currently filtering and compressing them to find the most relevant information"... but you could also just fake it by pretending you have the answer and you're just a little slow at getting to the point,,, like stall for a few seconds by giving a cookiecutter disclaimer about how you're just a hapless ai :D

    • @wiltedblackrose
      @wiltedblackrose 9 หลายเดือนก่อน

      @@mungojelly aha, cool. But this doesn't make a difference to when I use it, e.g., for studying at Uni.

    • @mungojelly
      @mungojelly 9 หลายเดือนก่อน +1

      @@wiltedblackrose if it's for your own use & there's no customers to offend then you could make it quick & dirty in other ways--- then i'd think of like giving random raw retrieved documents to a little cheap hallucinatey model to see if it gets lucky and can answer right away, then next get answers from progressively slower chains of reasoning,,,,, if it was for my own use i'd definitely make it so there's visual feedback about what stuff it found & what it's doing, since if i made it myself then otherwise obscure visual feedback where documents are flashing by too quickly to read or w/e would make sense to me b/c i knew exactly what it's doing

  • @marshallmcluhan33
    @marshallmcluhan33 9 หลายเดือนก่อน +2

    Thoughts on the "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" paper?

    • @samwitteveenai
      @samwitteveenai  9 หลายเดือนก่อน +3

      Interesting paper. I am currently traveling, but will try to make a video about the paper or show some of the ideas in a project when I get a chance

  • @choiswimmer
    @choiswimmer 9 หลายเดือนก่อน +1

    Typo in the thumbnail. It's 4 not 5

  • @googleyoutubechannel8554
    @googleyoutubechannel8554 5 หลายเดือนก่อน

    It seems like there's this huge disconnect in understanding of how state of the art 'RAG' works, eg. using document upload in the chatGPT 4 UI, vs all the langchain tutorials etc on RAG, I feel like the community doesn't understand that OpenAI is getting far better results, and seems to be processing embeddings in a way that's much more advanced than langchain based systems do, but that the community isn't even aware that 'langchain RAG' and 'OpenAI internal RAG' are completely different animals. eg. it seems uploaded docs are added as embeddings into a chatGPT 4 query completely orthogonally to the context window, yet all langchain examples I see end up returning text from a 'retriever and shoving this output into the llm context, I don't think good RAG even works that way...

  • @HazemAzim
    @HazemAzim 9 หลายเดือนก่อน

    Great . how about cross-encoders and re-reranking

    • @adriangabriel3219
      @adriangabriel3219 9 หลายเดือนก่อน

      i use it and my experience is that it improves retrieval a lot! The out of fashion SentenceTransformers perform amazing there!

    • @HazemAzim
      @HazemAzim 9 หลายเดือนก่อน

      I am doing some benchmark testing on arabic datasets and the top I am getting super results with ME5 embeddings with cohere reranker

    • @samwitteveenai
      @samwitteveenai  9 หลายเดือนก่อน +1

      Yes I still have a number more coming in this series.