RAG for long context LLMs
ฝัง
- เผยแพร่เมื่อ 21 มี.ค. 2024
- This is a talk that @rlancemartin gave at a few recent meetups on RAG in the era of long context LLMs. With context windows growing to 1M+ tokens, there have been many questions about whether RAG is "dead." We pull together threads from a few different recent projects to take a stab at addressing this. We review some current limitations with long context LLM fact reasoning & retrieval (using multi-needle in a haystack analysis), but also discuss some likely shifts in the RAG landscape due to expanding context windows (approaches for doc-centric indexing and RAG "flow engineering").
Slides:
docs.google.com/presentation/...
Highlighted references:
1/ Multi-needle analysis w/ @GregKamradt
blog.langchain.dev/multi-need...
2/ RAPTOR (@parthsarthi03 et al)
github.com/parthsarthi03/rapt...
• Building long context ...
3/ Dense-X / multi-representation indexing (@tomchen0 et al)
arxiv.org/pdf/2312.06648.pdf
blog.langchain.dev/semi-struc...
4/ Long context embeddings (@JonSaadFalcon, @realDanFu, @simran_s_arora)
hazyresearch.stanford.edu/blo...
www.together.ai/blog/rag-tuto...
5/ Self-RAG (@AkariAsai et al), C-RAG (Shi-Qi Yan et al)
arxiv.org/abs/2310.11511
arxiv.org/abs/2401.15884
blog.langchain.dev/agentic-ra... (edited)
Timepoints:
0:20 - Context windows are getting longer
2:10 - Multi-needle in a haystack
9:30 - How might RAG change?
12:00 - Query analysis
13:07 - Document-centric indexing
16:23 - Self-reflective RAG
19:40 - Summary
Thank you for uploading this. I live in the middle of nowhere Kansas. I'm a single dad of three with full custody. Life is full time 24/7. I sincerely appreciate you taking the time to share this with everyone. I love learning about it to the point my head hurts.
Long context LLMs are nice, but are you willing to pay for a question using the full 1M tokens, for example? Probably not. While the cost of inference is measured in token counts, RAG will be relevant.
🌟 **Fantastic insights!** 🌟 @rlancemartin, thank you for sharing this thought-provoking talk on RAG in the era of long context LLMs. The way you weave together threads from various recent projects to address the question of whether RAG is "dead" is truly commendable.
The challenges posed by context windows growing to 1M+ tokens are significant, and your exploration of limitations with long context LLM fact reasoning and retrieval (using multi-needle in a haystack analysis) sheds light on crucial aspects. But it's equally exciting to hear about the potential shifts in the RAG landscape due to expanding context windows-especially the approaches for doc-centric indexing and RAG "flow engineering."
Your talk inspires us to keep pushing the boundaries and adapt to the evolving landscape. Kudos! 🙌🚀
i love lance absolutely creasing at the pareto slide
Super useful analyses. Great work.
my brain has stored everything that’s ever happened and will ever happen, yet remembers only what’s relevant to the context of today, necessary to avoid the risks of tomorrow, and wouldn’t be watching this TH-cam if it always knew what answers to find simply because they exist at all.
Lol, "I don't wonna tell Harrison how much I spent" 🤣 Now, that's pretty good needle in a 21 minute haystack 😆. The hay is pretty awesome too though. Thank you for all the hay!
Fantastic lecture, keep it up!
Very useful, thanks for sharing!
Would love it if you could showcase a working rag example with live changing data. For example item price change, or policy update. Does it require to manually manage chunks and embedding references or are there better existing solutions? I think this really differentiates between fun-todo and actual production systems and applications.
Thanks and all the best!
Prices are most of the time tabular data and therefore it´s best to treat them this way. LLMs are pretty good to write queries based on a provided SQL Schema
@@codingcrashcourses8533 no you're not getting it.... Price was just an example, this can be any data that changes, and once more, it might be not my but my users' data, so I need a pure generic rag, but with the ability to re-RAG when files change
Hope you see what I mean.
Thanks for responding all the best!
I wish we had access to some public LangChain Notion project. The diagrams, etc. are incredibly helpful. I'd love to print them out, laminate them, and post them in my shower :D
Excellent video but I’m not persuaded that the indexing system you suggest is going to work where the documents themselves are large relative to the number of chunks involved. I’m thinking a particular of legal opinions or other legal documents. That can be 50 pages long. I’m concerned that the summarization might be insufficiently granular, and that the retrieval of the entire document or documents would end up, clogging the context window, and lead you to the same problem that you discussed at the beginning of the video.
Thanks, very useful!
Thank you for the very nice video as usual. Can you provide a proposal for a simple solution to get the best of the two worlds, the RAG and the large context window?
You know what’s way better? Not using NLP at all for for retrieving ranks. I use concurrent parquet caching and pass continuously queried word matches as index indices to the Ai. This basically cut the token limit requirement for my Ai model down to near non existence. It also always has a starting point retrieved as a Dict,
Great content. Expertly delivered.
What would be nice is looking at this from private enterprise data perspective. Where you have a lot of,ore restriction on what can be in the context in terms of data bs metadata
LET'S GO LANCE!!!
Nice. Retrieving a full document given a short question may be troublesome though, given that we’re still trying to move everything to high dimensional space and compare them. However doing things semantically and storing the vectors with a link back to the actual document would likely perform much better
RAG is excellent for knowledge representation and retrieval, superior to training or fine tuning
Could you please comment on the recency of your latest results? When did you compute the result figures for the article?
What application you guys use to create this flow diagram?
Excalidraw 😎
I think that LLMs for long context in RAG could be useful when you have critic information and you need the exact data and a lot of information in a system.
Hello, Anyone know a good technical implementation for those techniques ?!
it'll help too much and really needed
How good are humans at this needle in a haystack sort of tests? Humans seem to do a lot better with this sort of retrieval when they have a model to align the structure. I refer to tests done where large chunks of information were given to non-experts and experts ie details about flying. The experts were able to remember and retrieve the relevant information with a higher hit rate than non-experts because they had a model to identify the important information and store it. How much would that sort of approach work for LLM performance? Such as letting it know that you are giving it information and it has to remember it for retrieval later.
I think you should distinguish Visually Rich Dense documents from ordinary text documents at least, when speaking about the document-level.
btw you kinda far from what pizza is 😂 neither figs, prosciutto or goat cheese, I really don't know what kind of pizza you eat in the us but seems to be real garbage if you put those ingredients on it. Get it from an Italian from Naples, where pizza was born 😜. Hope you're going to change those to tomato, mozzarella, parmesan and basil next time 🤣🤣🤣. Jokes apart, good job dude!
Ha, well I used the same example that Anthropic used :D ... so we can blame them.