Contextual RAG is stupidly brilliant!

1littlecoder

มุมมอง 16 747

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 14 พ.ย. 2024
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 57

@epokaixyz หลายเดือนก่อน ⁺¹⁵
Consider these actionable insights from the video:
1. Understand the power of context in search queries and how it enhances accuracy in Retrieval Augmented Generation (RAG).
2. Experiment with different chunking strategies for your data when building your RAG system.
3. Explore and utilize embedding models like Gemini and Voyage for transforming text into numerical representations.
4. Combine embedding models with BM25, a ranking function, to improve ranking and retrieval processes.
5. Implement contextual retrieval by adding context to data chunks using Large Language Models (LLMs).
6. Analyze the cost and benefits of using contextual retrieval, considering factors like processing power and latency.
7. Optimize your RAG system by experimenting with reranking during inference to fine-tune retrieval results.
@amortalbeing หลายเดือนก่อน
so the LLM is the Achilles heel of the whole process. if it messes up the context, everything goes south immediately! but if it works well by default, it will enhance the final results
@int8float64 หลายเดือนก่อน ⁺⁵
As you said its really costly like graph vector DBs and high maintenance. A classic (sparse + dense retriever) + sparse reranker should simple do a good job also considering most of the new sota models have more context window.
@PeterDrewSEO หลายเดือนก่อน ⁺⁸
Mate, I've been trying to understand RAG for ages, non coder here obviously, but your explanation was brilliant. Thank you
@dr.mikeybee หลายเดือนก่อน ⁺¹
You can create the contextual tag locally using ollama.
@MhemanthRachaboyina 3 วันที่ผ่านมา ⁺¹
Great Video
@1littlecoder 3 วันที่ผ่านมา
@@MhemanthRachaboyina thank you
@wylhias หลายเดือนก่อน ⁺¹
I've been working on something quite similar over the last few months for a corpus of documents that are in a tree hierarchy to increase accuracy. Seems it was not a bad idea after all 😁
@henkhbit5748 หลายเดือนก่อน
Thanks for the update.👍 We see a lot of different techniques to improve RAG and the additional quality improvement are not that big and the cost are much higher (more tokens) and also the inference time goes up... Agree, that for most of use cases its not worth the effort and money.
@laviray5447 หลายเดือนก่อน ⁺⁷
Honestly that few percent improvement is not worth for most cases...
@kenchang3456 หลายเดือนก่อน
This is really interesting and I think, intuitively, it will help me with my project. Thank you very much.
@RajaSekharaReddyKaluri หลายเดือนก่อน
Thank you!
Feeding in whole document text to add few lines of context for each chunk seems way too much for less benefit. Instead we would need a better embedding model to enhance the retrieval without any of the overheads.
And Companies will be interested in chunking, embedding and indexing proprietary documents only once in their lifetime. They can't reindex the whole archive everytime a new improvement is released
@arashputata หลายเดือนก่อน ⁺⁴
Is it really worth all the noise and having a new name for it and all? This is an idea that many developers have already been using. I mean anyone who thinks a little bit naturally realizes that adding a little description of what the chunk is about in relation to the rest of the document, would have automatically do it :D Myself and many others have been doing it for very obvious reasons .. I just didnt know I have to give it a name and publish it as technique.. these LLM BS taught me one thing , and that is put a name on any trivial idea and you are now an inventor
@1littlecoder หลายเดือนก่อน
Honestly, that's one thing I've actually mentioned on the video. If such improvements are something you need
@laviray5447 หลายเดือนก่อน
Yes, actually there are many more techniques like this which offer similar percent of improvement and none of them are worth it. Basic rag is still enough for now.
@ysy69 หลายเดือนก่อน ⁺¹
excellent video and insights!
@1littlecoder หลายเดือนก่อน
Glad you enjoyed it!
@souvickdas5564 หลายเดือนก่อน ⁺²
How to generate those context for chunks without having the sufficient information to the LLM regarding the chunk? How they are getting the information about the revenue in that example?
@1littlecoder หลายเดือนก่อน
That is from the entire document
@souvickdas5564 หลายเดือนก่อน
@@1littlecoder then it will be very much costly as the entire document is being fed into llm. And what about the llm's token limit? If I have a significantly large document.
@randomlettersqzkebkw หลายเดือนก่อน
@@souvickdas5564 this techique is golden for local run LLMs. Its free.
@akshaya626 หลายเดือนก่อน
I have the same doubt. Please let us know if there's clarity.
@ROKIBULHASANTANZIM หลายเดือนก่อน ⁺¹
i was really caught off guard when you said '....large human being' 😂😂
@1littlecoder หลายเดือนก่อน ⁺¹
i just rewatched it 🤣
@1voice4all หลายเดือนก่อน
Unfortunately, large humans are extinct! [or maybe left planet Earth.]
@shobhitsadwal6081 หลายเดือนก่อน
🤣🤣🤣🤣🤣🤣
@phanindraparashar8930 หลายเดือนก่อน ⁺¹
I was experimenting with this and its really amazing. But too simple approach 😅😅
@1littlecoder หลายเดือนก่อน
the beauty is how simple it is :D
@phanindraparashar8930 หลายเดือนก่อน
@@1littlecoder keeping it simple always works
@afolamitimothy8819 หลายเดือนก่อน ⁺¹
Thanks 😅
@tripandplan หลายเดือนก่อน
to generate context.. do we need to pass all documents.. how we will address the token limit ?
@henno6207 หลายเดือนก่อน
It would be great if they could just build this into their platform, like openai has with their agents.
@DCinzi หลายเดือนก่อน ⁺¹
Wait, would nbot be more efficent for the LLm to rather than create a context use that compute ti create a new chunk that puts together two previous chuncks (eg. chunch 1 + chunck x) based on context, and rather than go down of the route "lets try to aid the LLM to find the right chunk to the user request by maximizing attention to that one particular chunk", go down the route " lets try to aid the LLM [..] by maximizing the probability to find the right node in a net of higher percentage possibilities"?
@ByteBop911 หลายเดือนก่อน ⁺¹
Isn’t it agentic chunking strategy??
@KevinKreger หลายเดือนก่อน
Smart chunks 🎉
@1littlecoder หลายเดือนก่อน ⁺¹
Someone's going to steal this name for a new RAG technqiue :)
@Praveenppk2255 หลายเดือนก่อน ⁺¹
is it something similiar to what google calls context caching ?
@1littlecoder หลายเดือนก่อน
No context Caching is basically on top of it. Thanks for the reminder. I should probably make a separate video on
@Praveenppk2255 หลายเดือนก่อน
@@1littlecoder oh nice , perfect
@SleepeJobs หลายเดือนก่อน ⁺¹
Thank you for such insights and simple explanation
@limjuroy7078 หลายเดือนก่อน
I think the reason why Anthropic introduces this technique is because of they have the CACHING!!!
@1littlecoder หลายเดือนก่อน
easy upsell 👀
@limjuroy7078 หลายเดือนก่อน
@@1littlecoder As far as I know, if u use the prompt caching feature to store all your documents such as your company documents, it would greatly reduce the cost, particularly on the input tokens cost consumption as {{WHOLE DOCUMENT}} are retrieved from the cache. Am I right?
@1voice4all หลายเดือนก่อน
They could have used something similar to LLMLingua on each chunk then pass it to a smaller model for deriving context as it is a very specific use and does not demand a huge model. This way cost can be controlled and the quality can be enhanced. Also, they can add a model router rather than using a predefined model. This model router can choose the model based on the information corpus has. There are many patterns which can enhance this RAG pipeline. This just seems very lazy.
@shreyassrinivasa5983 หลายเดือนก่อน ⁺¹
Have been doing this long back and much more
@truliapro7112 หลายเดือนก่อน ⁺²
Your content is really good, but I've noticed that you tend to speak very quickly, almost as if you're holding your breath. Is there a reason for this? I feel that a slower, calmer pace would make the information easier to absorb and more enjoyable to follow. It sometimes feels like you're rushing, and I believe a more relaxed delivery would enhance your already great work. Please understand this is meant as constructive feedback, not a criticism. I'm just offering a suggestion to help make your content even better.
@1littlecoder หลายเดือนก่อน ⁺¹
Thank you for the feedback. I understand. I have a nature of speaking very fast so n typically I've to slow down. I'll try to do that more diligently
@MichealScott24 หลายเดือนก่อน ⁺¹
❤🫡
@Macorelppa หลายเดือนก่อน ⁺³
This is the guy who called o1 preview overhyped. 🤭
@1littlecoder หลายเดือนก่อน ⁺⁶
Did I?
@dhanush.priyan หลายเดือนก่อน
he never said that. he said, gpt 01 is just a glorified chain of though and that's actually true
@phanindraparashar8930 หลายเดือนก่อน ⁺³
I tried another stupidity simple aproach.
Create a QA data set with LLM.
Find nearest question and provide answer.
Surprisingly it also works really great 😅😅😅
@1littlecoder หลายเดือนก่อน ⁺²
Here you go. You just invented a new RAG technique 😉
@arashputata หลายเดือนก่อน ⁺²
This is actually surprisignly good for RAG on expert/narrow domains! i did the same thing for a bot on web accessibility rules, and it worked perfect AF
@phanindraparashar8930 หลายเดือนก่อน ⁺¹
@@arashputata which method
@phanindraparashar8930 หลายเดือนก่อน ⁺¹
@@1littlecoder also u can later use the data to fine-tune 😅😅
@kontrakamkam7148 หลายเดือนก่อน
yeah that is my not-so-secret weapon too 😂

ต่อไป

เล่นอัตโนมัติ

Anthropic's new improved RAG: Explained (for all LLM)