Thank you for uploading this. I live in the middle of nowhere Kansas. I'm a single dad of three with full custody. Life is full time 24/7. I sincerely appreciate you taking the time to share this with everyone. I love learning about it to the point my head hurts.
Long context LLMs are nice, but are you willing to pay for a question using the full 1M tokens, for example? Probably not. While the cost of inference is measured in token counts, RAG will be relevant.
🌟 **Fantastic insights!** 🌟 @rlancemartin, thank you for sharing this thought-provoking talk on RAG in the era of long context LLMs. The way you weave together threads from various recent projects to address the question of whether RAG is "dead" is truly commendable. The challenges posed by context windows growing to 1M+ tokens are significant, and your exploration of limitations with long context LLM fact reasoning and retrieval (using multi-needle in a haystack analysis) sheds light on crucial aspects. But it's equally exciting to hear about the potential shifts in the RAG landscape due to expanding context windows-especially the approaches for doc-centric indexing and RAG "flow engineering." Your talk inspires us to keep pushing the boundaries and adapt to the evolving landscape. Kudos! 🙌🚀
Lol, "I don't wonna tell Harrison how much I spent" 🤣 Now, that's pretty good needle in a 21 minute haystack 😆. The hay is pretty awesome too though. Thank you for all the hay!
my brain has stored everything that’s ever happened and will ever happen, yet remembers only what’s relevant to the context of today, necessary to avoid the risks of tomorrow, and wouldn’t be watching this TH-cam if it always knew what answers to find simply because they exist at all.
I wish we had access to some public LangChain Notion project. The diagrams, etc. are incredibly helpful. I'd love to print them out, laminate them, and post them in my shower :D
Would love it if you could showcase a working rag example with live changing data. For example item price change, or policy update. Does it require to manually manage chunks and embedding references or are there better existing solutions? I think this really differentiates between fun-todo and actual production systems and applications. Thanks and all the best!
Prices are most of the time tabular data and therefore it´s best to treat them this way. LLMs are pretty good to write queries based on a provided SQL Schema
@@codingcrashcourses8533 no you're not getting it.... Price was just an example, this can be any data that changes, and once more, it might be not my but my users' data, so I need a pure generic rag, but with the ability to re-RAG when files change Hope you see what I mean. Thanks for responding all the best!
Excellent video but I’m not persuaded that the indexing system you suggest is going to work where the documents themselves are large relative to the number of chunks involved. I’m thinking a particular of legal opinions or other legal documents. That can be 50 pages long. I’m concerned that the summarization might be insufficiently granular, and that the retrieval of the entire document or documents would end up, clogging the context window, and lead you to the same problem that you discussed at the beginning of the video.
Nice. Retrieving a full document given a short question may be troublesome though, given that we’re still trying to move everything to high dimensional space and compare them. However doing things semantically and storing the vectors with a link back to the actual document would likely perform much better
Great content. Expertly delivered. What would be nice is looking at this from private enterprise data perspective. Where you have a lot of,ore restriction on what can be in the context in terms of data bs metadata
You know what’s way better? Not using NLP at all for for retrieving ranks. I use concurrent parquet caching and pass continuously queried word matches as index indices to the Ai. This basically cut the token limit requirement for my Ai model down to near non existence. It also always has a starting point retrieved as a Dict,
Thank you for the very nice video as usual. Can you provide a proposal for a simple solution to get the best of the two worlds, the RAG and the large context window?
I think that LLMs for long context in RAG could be useful when you have critic information and you need the exact data and a lot of information in a system.
How good are humans at this needle in a haystack sort of tests? Humans seem to do a lot better with this sort of retrieval when they have a model to align the structure. I refer to tests done where large chunks of information were given to non-experts and experts ie details about flying. The experts were able to remember and retrieve the relevant information with a higher hit rate than non-experts because they had a model to identify the important information and store it. How much would that sort of approach work for LLM performance? Such as letting it know that you are giving it information and it has to remember it for retrieval later.
btw you kinda far from what pizza is 😂 neither figs, prosciutto or goat cheese, I really don't know what kind of pizza you eat in the us but seems to be real garbage if you put those ingredients on it. Get it from an Italian from Naples, where pizza was born 😜. Hope you're going to change those to tomato, mozzarella, parmesan and basil next time 🤣🤣🤣. Jokes apart, good job dude!
Thank you for uploading this. I live in the middle of nowhere Kansas. I'm a single dad of three with full custody. Life is full time 24/7. I sincerely appreciate you taking the time to share this with everyone. I love learning about it to the point my head hurts.
Long context LLMs are nice, but are you willing to pay for a question using the full 1M tokens, for example? Probably not. While the cost of inference is measured in token counts, RAG will be relevant.
🌟 **Fantastic insights!** 🌟 @rlancemartin, thank you for sharing this thought-provoking talk on RAG in the era of long context LLMs. The way you weave together threads from various recent projects to address the question of whether RAG is "dead" is truly commendable.
The challenges posed by context windows growing to 1M+ tokens are significant, and your exploration of limitations with long context LLM fact reasoning and retrieval (using multi-needle in a haystack analysis) sheds light on crucial aspects. But it's equally exciting to hear about the potential shifts in the RAG landscape due to expanding context windows-especially the approaches for doc-centric indexing and RAG "flow engineering."
Your talk inspires us to keep pushing the boundaries and adapt to the evolving landscape. Kudos! 🙌🚀
i love lance absolutely creasing at the pareto slide
Lol, "I don't wonna tell Harrison how much I spent" 🤣 Now, that's pretty good needle in a 21 minute haystack 😆. The hay is pretty awesome too though. Thank you for all the hay!
my brain has stored everything that’s ever happened and will ever happen, yet remembers only what’s relevant to the context of today, necessary to avoid the risks of tomorrow, and wouldn’t be watching this TH-cam if it always knew what answers to find simply because they exist at all.
Super useful analyses. Great work.
Fantastic lecture, keep it up!
Very useful, thanks for sharing!
I wish we had access to some public LangChain Notion project. The diagrams, etc. are incredibly helpful. I'd love to print them out, laminate them, and post them in my shower :D
Would love it if you could showcase a working rag example with live changing data. For example item price change, or policy update. Does it require to manually manage chunks and embedding references or are there better existing solutions? I think this really differentiates between fun-todo and actual production systems and applications.
Thanks and all the best!
Prices are most of the time tabular data and therefore it´s best to treat them this way. LLMs are pretty good to write queries based on a provided SQL Schema
@@codingcrashcourses8533 no you're not getting it.... Price was just an example, this can be any data that changes, and once more, it might be not my but my users' data, so I need a pure generic rag, but with the ability to re-RAG when files change
Hope you see what I mean.
Thanks for responding all the best!
Excellent video but I’m not persuaded that the indexing system you suggest is going to work where the documents themselves are large relative to the number of chunks involved. I’m thinking a particular of legal opinions or other legal documents. That can be 50 pages long. I’m concerned that the summarization might be insufficiently granular, and that the retrieval of the entire document or documents would end up, clogging the context window, and lead you to the same problem that you discussed at the beginning of the video.
Thanks, very useful!
RAG is excellent for knowledge representation and retrieval, superior to training or fine tuning
Nice. Retrieving a full document given a short question may be troublesome though, given that we’re still trying to move everything to high dimensional space and compare them. However doing things semantically and storing the vectors with a link back to the actual document would likely perform much better
LET'S GO LANCE!!!
Great content. Expertly delivered.
What would be nice is looking at this from private enterprise data perspective. Where you have a lot of,ore restriction on what can be in the context in terms of data bs metadata
You know what’s way better? Not using NLP at all for for retrieving ranks. I use concurrent parquet caching and pass continuously queried word matches as index indices to the Ai. This basically cut the token limit requirement for my Ai model down to near non existence. It also always has a starting point retrieved as a Dict,
Thank you for the very nice video as usual. Can you provide a proposal for a simple solution to get the best of the two worlds, the RAG and the large context window?
Could you please comment on the recency of your latest results? When did you compute the result figures for the article?
I think that LLMs for long context in RAG could be useful when you have critic information and you need the exact data and a lot of information in a system.
How good are humans at this needle in a haystack sort of tests? Humans seem to do a lot better with this sort of retrieval when they have a model to align the structure. I refer to tests done where large chunks of information were given to non-experts and experts ie details about flying. The experts were able to remember and retrieve the relevant information with a higher hit rate than non-experts because they had a model to identify the important information and store it. How much would that sort of approach work for LLM performance? Such as letting it know that you are giving it information and it has to remember it for retrieval later.
What application you guys use to create this flow diagram?
Excalidraw 😎
Hello, Anyone know a good technical implementation for those techniques ?!
it'll help too much and really needed
I think you should distinguish Visually Rich Dense documents from ordinary text documents at least, when speaking about the document-level.
btw you kinda far from what pizza is 😂 neither figs, prosciutto or goat cheese, I really don't know what kind of pizza you eat in the us but seems to be real garbage if you put those ingredients on it. Get it from an Italian from Naples, where pizza was born 😜. Hope you're going to change those to tomato, mozzarella, parmesan and basil next time 🤣🤣🤣. Jokes apart, good job dude!
Ha, well I used the same example that Anthropic used :D ... so we can blame them.