RAG for long context LLMs

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 มี.ค. 2024
  • This is a talk that @rlancemartin gave at a few recent meetups on RAG in the era of long context LLMs. With context windows growing to 1M+ tokens, there have been many questions about whether RAG is "dead." We pull together threads from a few different recent projects to take a stab at addressing this. We review some current limitations with long context LLM fact reasoning & retrieval (using multi-needle in a haystack analysis), but also discuss some likely shifts in the RAG landscape due to expanding context windows (approaches for doc-centric indexing and RAG "flow engineering").
    Slides:
    docs.google.com/presentation/...
    Highlighted references:
    1/ Multi-needle analysis w/ @GregKamradt
    blog.langchain.dev/multi-need...
    2/ RAPTOR (@parthsarthi03 et al)
    github.com/parthsarthi03/rapt...
    • Building long context ...
    3/ Dense-X / multi-representation indexing (@tomchen0 et al)
    arxiv.org/pdf/2312.06648.pdf
    blog.langchain.dev/semi-struc...
    4/ Long context embeddings (@JonSaadFalcon, @realDanFu, @simran_s_arora)
    hazyresearch.stanford.edu/blo...
    www.together.ai/blog/rag-tuto...
    5/ Self-RAG (@AkariAsai et al), C-RAG (Shi-Qi Yan et al)
    arxiv.org/abs/2310.11511
    arxiv.org/abs/2401.15884
    blog.langchain.dev/agentic-ra... (edited)
    Timepoints:
    0:20 - Context windows are getting longer
    2:10 - Multi-needle in a haystack
    9:30 - How might RAG change?
    12:00 - Query analysis
    13:07 - Document-centric indexing
    16:23 - Self-reflective RAG
    19:40 - Summary

ความคิดเห็น • 30

  • @frankforrester42
    @frankforrester42 4 หลายเดือนก่อน +18

    Thank you for uploading this. I live in the middle of nowhere Kansas. I'm a single dad of three with full custody. Life is full time 24/7. I sincerely appreciate you taking the time to share this with everyone. I love learning about it to the point my head hurts.

  • @cnmoro55
    @cnmoro55 4 หลายเดือนก่อน +22

    Long context LLMs are nice, but are you willing to pay for a question using the full 1M tokens, for example? Probably not. While the cost of inference is measured in token counts, RAG will be relevant.

  • @Ricardo_Cordero
    @Ricardo_Cordero 4 หลายเดือนก่อน +1

    🌟 **Fantastic insights!** 🌟 @rlancemartin, thank you for sharing this thought-provoking talk on RAG in the era of long context LLMs. The way you weave together threads from various recent projects to address the question of whether RAG is "dead" is truly commendable.
    The challenges posed by context windows growing to 1M+ tokens are significant, and your exploration of limitations with long context LLM fact reasoning and retrieval (using multi-needle in a haystack analysis) sheds light on crucial aspects. But it's equally exciting to hear about the potential shifts in the RAG landscape due to expanding context windows-especially the approaches for doc-centric indexing and RAG "flow engineering."
    Your talk inspires us to keep pushing the boundaries and adapt to the evolving landscape. Kudos! 🙌🚀

  • @voulieav
    @voulieav 3 หลายเดือนก่อน +2

    i love lance absolutely creasing at the pareto slide

  • @kevon217
    @kevon217 3 หลายเดือนก่อน

    Super useful analyses. Great work.

  • @maskedvillainai
    @maskedvillainai 3 หลายเดือนก่อน +1

    my brain has stored everything that’s ever happened and will ever happen, yet remembers only what’s relevant to the context of today, necessary to avoid the risks of tomorrow, and wouldn’t be watching this TH-cam if it always knew what answers to find simply because they exist at all.

  • @Todorkotev
    @Todorkotev 4 หลายเดือนก่อน +5

    Lol, "I don't wonna tell Harrison how much I spent" 🤣 Now, that's pretty good needle in a 21 minute haystack 😆. The hay is pretty awesome too though. Thank you for all the hay!

  • @yarkkharkov
    @yarkkharkov 3 หลายเดือนก่อน +1

    Fantastic lecture, keep it up!

  • @Abdien
    @Abdien 4 หลายเดือนก่อน +1

    Very useful, thanks for sharing!

  • @fire17102
    @fire17102 4 หลายเดือนก่อน +6

    Would love it if you could showcase a working rag example with live changing data. For example item price change, or policy update. Does it require to manually manage chunks and embedding references or are there better existing solutions? I think this really differentiates between fun-todo and actual production systems and applications.
    Thanks and all the best!

    • @codingcrashcourses8533
      @codingcrashcourses8533 4 หลายเดือนก่อน +2

      Prices are most of the time tabular data and therefore it´s best to treat them this way. LLMs are pretty good to write queries based on a provided SQL Schema

    • @fire17102
      @fire17102 4 หลายเดือนก่อน

      @@codingcrashcourses8533 no you're not getting it.... Price was just an example, this can be any data that changes, and once more, it might be not my but my users' data, so I need a pure generic rag, but with the ability to re-RAG when files change
      Hope you see what I mean.
      Thanks for responding all the best!

  • @landon.wilkins
    @landon.wilkins 3 หลายเดือนก่อน +1

    I wish we had access to some public LangChain Notion project. The diagrams, etc. are incredibly helpful. I'd love to print them out, laminate them, and post them in my shower :D

  • @sethjchandler
    @sethjchandler 3 หลายเดือนก่อน +2

    Excellent video but I’m not persuaded that the indexing system you suggest is going to work where the documents themselves are large relative to the number of chunks involved. I’m thinking a particular of legal opinions or other legal documents. That can be 50 pages long. I’m concerned that the summarization might be insufficiently granular, and that the retrieval of the entire document or documents would end up, clogging the context window, and lead you to the same problem that you discussed at the beginning of the video.

  • @N4LNba777
    @N4LNba777 3 หลายเดือนก่อน +1

    Thanks, very useful!

  • @IbrahimSobh
    @IbrahimSobh 4 หลายเดือนก่อน +1

    Thank you for the very nice video as usual. Can you provide a proposal for a simple solution to get the best of the two worlds, the RAG and the large context window?

  • @maskedvillainai
    @maskedvillainai 3 หลายเดือนก่อน

    You know what’s way better? Not using NLP at all for for retrieving ranks. I use concurrent parquet caching and pass continuously queried word matches as index indices to the Ai. This basically cut the token limit requirement for my Ai model down to near non existence. It also always has a starting point retrieved as a Dict,

  • @muhannadobeidat
    @muhannadobeidat 3 หลายเดือนก่อน

    Great content. Expertly delivered.
    What would be nice is looking at this from private enterprise data perspective. Where you have a lot of,ore restriction on what can be in the context in terms of data bs metadata

  • @anonymous6666
    @anonymous6666 3 หลายเดือนก่อน +1

    LET'S GO LANCE!!!

  • @bertobertoberto3
    @bertobertoberto3 3 หลายเดือนก่อน

    Nice. Retrieving a full document given a short question may be troublesome though, given that we’re still trying to move everything to high dimensional space and compare them. However doing things semantically and storing the vectors with a link back to the actual document would likely perform much better

  • @maxi-g
    @maxi-g 4 หลายเดือนก่อน +2

    RAG is excellent for knowledge representation and retrieval, superior to training or fine tuning

  • @alivecoding4995
    @alivecoding4995 3 หลายเดือนก่อน

    Could you please comment on the recency of your latest results? When did you compute the result figures for the article?

  • @siddharthchauhan3404
    @siddharthchauhan3404 4 หลายเดือนก่อน +1

    What application you guys use to create this flow diagram?

    • @LangChain
      @LangChain  4 หลายเดือนก่อน +6

      Excalidraw 😎

  • @insitegd7483
    @insitegd7483 4 หลายเดือนก่อน

    I think that LLMs for long context in RAG could be useful when you have critic information and you need the exact data and a lot of information in a system.

  • @ahmed_hefnawy
    @ahmed_hefnawy 3 หลายเดือนก่อน

    Hello, Anyone know a good technical implementation for those techniques ?!
    it'll help too much and really needed

  • @quintinevans2485
    @quintinevans2485 3 หลายเดือนก่อน

    How good are humans at this needle in a haystack sort of tests? Humans seem to do a lot better with this sort of retrieval when they have a model to align the structure. I refer to tests done where large chunks of information were given to non-experts and experts ie details about flying. The experts were able to remember and retrieve the relevant information with a higher hit rate than non-experts because they had a model to identify the important information and store it. How much would that sort of approach work for LLM performance? Such as letting it know that you are giving it information and it has to remember it for retrieval later.

  • @alivecoding4995
    @alivecoding4995 3 หลายเดือนก่อน

    I think you should distinguish Visually Rich Dense documents from ordinary text documents at least, when speaking about the document-level.

  • @pioggiadifuoco7522
    @pioggiadifuoco7522 4 หลายเดือนก่อน

    btw you kinda far from what pizza is 😂 neither figs, prosciutto or goat cheese, I really don't know what kind of pizza you eat in the us but seems to be real garbage if you put those ingredients on it. Get it from an Italian from Naples, where pizza was born 😜. Hope you're going to change those to tomato, mozzarella, parmesan and basil next time 🤣🤣🤣. Jokes apart, good job dude!

    • @r.lancemartin7992
      @r.lancemartin7992 3 หลายเดือนก่อน

      Ha, well I used the same example that Anthropic used :D ... so we can blame them.