Semantic Chunking for RAG

James Briggs

มุมมอง 27 642

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 23 ธ.ค. 2024

ความคิดเห็น • 67

@energyexecs 5 หลายเดือนก่อน ⁺³
James Brggs one of my favorites and I believe I am a "Patreon""member - spend hundreds of hours listening to about 10 podcasts, studying Large Language Models, Machine Learning and so called "AI". James Briggs breaks things down in easier to understand concepts. Thank you James Briggs
@jamesbriggs 5 หลายเดือนก่อน ⁺¹
hey that's awesome, I really appreciate the support!
@AaronJOlson 7 หลายเดือนก่อน ⁺²
Thank you! I’ve been doing this for a while, but did not have a good name for it.
@lalamax3d 7 หลายเดือนก่อน
best i have seen so far about understanding core concept of chunking , thanks
@jamesbriggs 7 หลายเดือนก่อน
glad it was helpful :)
@aaronsmyth7943 7 หลายเดือนก่อน ⁺¹²
At this point, you are practically Captain Chunk.
@naromsky 7 หลายเดือนก่อน ⁺⁹
King of Chunk
@jamesbriggs 7 หลายเดือนก่อน ⁺⁵
a title I have always wanted
@AdrienSales 7 หลายเดือนก่อน
Excellent content and explanation , espeicialy chunking core concepts and challenges. Keep going your work it's so precisous to learn 👍
@jamesbriggs 7 หลายเดือนก่อน ⁺¹
Glad to hear it helps
@xuantungnguyen9719 7 หลายเดือนก่อน ⁺³
Need a video on cross-chunk attention. Wasn’t attention all about key query and val anyway
@FrankenLab 2 หลายเดือนก่อน
@James Briggs Newbie here, was wondering if it was necessary to store the chunk with the vector, it seems like a lot of data duplication and a good way to fill your disk. I like the idea of storing the title, I was thinking about storing the document path and filename also. I haven't been able to find good info about what data besides vectors is also kept in the vector db. I understand that the vectors need to correlate to data, I just don't understand what data is actually represented in the vectors. If you just have an ID and the vectors, can't that ID point back to the document with the content?
@AGI-Bingo 7 หลายเดือนก่อน ⁺⁴
Hi James , would you please tell me how you would tackle this one..
How would you design a realtime updating rag system? For example, let's say our clients updated some details in some watched doc, I want the old chunks to be removed, and rechunked automatically. Have you seen such pipeline existing already? No one seems to cover this and I think it sets apart fun projects and actual production system. Thanks and all the best! Love your channel ❤
@shameekm2146 7 หลายเดือนก่อน ⁺¹
I have achieved this for one of the sources in my RAG bot. It has an api provided to access the data. So i run the embedding script on the delta changes.
@AGI-Bingo 7 หลายเดือนก่อน ⁺²
@@shameekm2146 amazing, would you please opensource it so we can all improve the pipeline as a community? 🌈
@rohansingh1057 หลายเดือนก่อน
RAG does not mean you "have to use vector embeddings and Vector DB". If you can run APIs to fetch relevant info, it should be good enough. Use function/tool calling to call the API.
Otherwise, if you are planning to watch some doc live you need to have the following pipeline this will only work if you are making small changes in the doc frequently ->
Doc Changed -> Webhook/Trigger to your system -> If diff is available, use it, if diff is not available, compute diff with old vs new doc.
-> Take nearby text as sample and compute embeddings -> Fetch top N nearby docs from the Vector DB (Hybrid search will work really well, tune the sparse vector weightage higher than normal RAG here) -> Ask LLM Agent to mark relevant chunks/Use reranking models (old chunk as query) -> Delete these old chunks from VectorDB -> Compute embeddings for the new changes -> Upsert the new vectors into the DB.
There are tons of edge cases that you will run into when running it this pipeline and they always every for each use case, so you will have to consider those accordingly.
@baskarjayaraman5821 7 หลายเดือนก่อน
Great video. Thanks for posting. I have been thinking of document chunking but using the LLM itself via prompting + k-shot. The approach you show will be cheaper of course but curious to see how these two approaches will compare in terms of any relevant non-cost metrics.
@FatherNovelty 7 หลายเดือนก่อน ⁺¹
At ~4:40, you mention that you should use the same encoder for the chunking and the encoding. Why? A chunk size captures a "single meaning", so why would it matter that the same encoder is used? If you look at the chunking as a clutering algorithim that creates meaningful chunks, then what does it matter that the encoders match? What am I missing?
@jamesbriggs 7 หลายเดือนก่อน ⁺¹
good point - yes they are capturing the "single meaning" and that single meaning will (hopefully) overlap a lot, but embedding models are not perfect and so they will not align between themselves. Similar to if someone asked myself and you to chunk an article, we'd likely overlap for the majority of the article, but I'm sure there would be differences
@shameekm2146 7 หลายเดือนก่อน
Thank you so much for this. Will test it out on the RAG flow in the company.
@jamesbriggs 7 หลายเดือนก่อน
welcome, would love to hear how it goes
@gullyburns1280 7 หลายเดือนก่อน
Another killer video. Great work!
@jonm691 6 หลายเดือนก่อน
Loved this explanation
@rodgerb2645 7 หลายเดือนก่อน
Love all your content sir!
@dinoscheidt 7 หลายเดือนก่อน ⁺³
People since GPT2: Simply ask an LLM recursively to please insert “{split}“ where a topic change etc happens according to a summary of prior text. Get embeddings. Use to separate and group.
2024: We would like to introduce a novel concept called Semantic Chunking with a sliding Context……..
Beginners must be truly lost 😮‍💨
@nikhilmaddirala 6 หลายเดือนก่อน
What's a good way to use the metadata for retrieval and ranking of the chunks?
@luciolrv 7 หลายเดือนก่อน
How does Parent Document Rag fits in your in your new techniques?
@GeertBaeke 7 หลายเดือนก่อน
We use a simple combination of Microsoft's Document Intelligence with markdown output and a simple markdown splitter. The improvement is noticeable although the Document Intelligence models do come at an additional cost.
@jamesbriggs 7 หลายเดือนก่อน ⁺²
yeah it depends on what you need ofcourse, I'm mostly interested in further abstraction and more analytics methods for chunking not for where it is now, but for where this type of experimentation might lead to in the future - I could see a few more iterations and improvements to more intelligent doc parsing and chunking to become increasingly more performant - but we'll see
@alivecoding4995 7 หลายเดือนก่อน
Do you have a link for this markdown processing? :)
We are using Document Intelligence as well, but not for layout analysis, yet.
@GayathriG-h5h 7 หลายเดือนก่อน
@@alivecoding4995you can also use layoutpdf reader from llmsherpra
@MrMoonsilver 7 หลายเดือนก่อน
Can this be used to create chunks for creating a training dataset as well? It would be great to chunk a document into 'statements' and use those statements for a dataset. In essence have a LLM create questions for each of those statements and use those pairs for training. Could you make a video to show how that works?
@scottmiller2591 7 หลายเดือนก่อน ⁺²
"Grab complete thoughts" is an obvious good and expensive thing. Except for tables, for instance.
@jamesbriggs 7 หลายเดือนก่อน ⁺²
yeah tables need to handled differently - doable if you are identifying text vs. table elements in your processing pipeline
@klik24 7 หลายเดือนก่อน
Just what i eas trying to lewrn ...awesome mate, thanks
@jamesbriggs 7 หลายเดือนก่อน
Nice np
@talesfromthetrailz 7 หลายเดือนก่อน
Dude already embedded whole documents of texts into PC haha would've helped a month ago. But awesome thanks for this! 🤘🏾
@jamesbriggs 7 หลายเดือนก่อน ⁺¹
Maybe for the next project 😅
@talesfromthetrailz 7 หลายเดือนก่อน
@@jamesbriggs quick question man. Is the objective of semantic chunking to achieve broader search results? Or to decrease query times? I'm thinking of it in terms of medium sized text docs, for example movies summaries and such. Thanks!
@fayluu248 6 หลายเดือนก่อน
Hi James, do you think that the chunking and embedding process in RAG will be unnecessary in the short future, as the input token length is no longer a limitation.
@jamesbriggs 6 หลายเดือนก่อน
I don’t think the input token length will become unlimited any time soon - but for smaller use cases (fitting within Anthropic limits) where latency and token cost are not important then you can use a pure LLM solution rather than RAG
@brianferrell9454 7 หลายเดือนก่อน
Do you think this causes the results to be biased towards smaller chunks? Because the user will only query probably no more than 10 words . So the highest semantic similar results may also only be 10 words and the chunks that are 400 tokens wouldn't have as high as a score unless you provide more context to the query?
@MrMoonsilver 7 หลายเดือนก่อน
Amazing video, thank you so much!!
@bastabey2652 7 หลายเดือนก่อน
using a high end LLM like GPT-4 or Opus or Gemini Ultra or Pro might be effective in performing semantic chunking.. Google large context window seems suitable for chunking large files.. we need to introduce LLM in automating the RAG stack
@jamesbriggs 7 หลายเดือนก่อน ⁺¹
Yeah I’d like to introduce an LLM chunker and see how they compare
@bastabey2652 7 หลายเดือนก่อน
@@jamesbriggs better than any non LLM chunker.. if we aim to empower user's with AI, why not empower the developer? chunking is not easy
@MrDespik 7 หลายเดือนก่อน
Hi James. Excuse me, maybe I missed it. But how you handle the situation that when we use semantic chunking we miss pages numbers for chunks? Is it possible to receive it with using this package?
@NhatNguyen-bq6jj 7 หลายเดือนก่อน
Can you introduce some articles related to this topic? Thanks!
@botondvasvari5758 7 หลายเดือนก่อน
and how can I use big models from huggingface ? I can't load them into memory because many of them are bigger than 15gb, some of them are 130gb+ . Any thoughts?
@trn450 7 หลายเดือนก่อน
Great material. 🙏
@amantandon-ln9xx 7 หลายเดือนก่อน
I see the #abstract is also with #title ideally both should be in different chunks so that LLM can understand better semantics.
@swethak7198 5 หลายเดือนก่อน
i have a doubt that i have a document which has the many page references to one to another page, should i want to group all the data into the same chunks (like to get data from first page and in this reference page number is in page 3 means should i get data from both pages and store it a single chunk ) does is this only way or is there any special models . Else give some idea
@drosi1994 3 หลายเดือนก่อน
Hmm that's an issue that you could solve in the retrieving stage not chunking... When you retrieve a chunk you can check with an LLM fast model if it has references to another one to get them as well
@FDasdana 3 หลายเดือนก่อน
Does this library support ollama, gemini or hf encoders also or Is it only for chatgpt?
@jamesbriggs 3 หลายเดือนก่อน
it supports these encoders github.com/aurelio-labs/semantic-router/tree/main/semantic_router/encoders
@manslaughterinc.9135 4 หลายเดือนก่อน
Unfortunately, the semantic router has removed this feature, or refactored it in some way.
@jamesbriggs 4 หลายเดือนก่อน
hey yes they were deprecated in favour of this th-cam.com/video/7JS0pqXvha8/w-d-xo.html
@mrchongnoi 7 หลายเดือนก่อน
Why not chunk based on paragraphs, lists, and tables.
@jimmc448 7 หลายเดือนก่อน ⁺¹
My son just asked if you were the Rock
@jamesbriggs 7 หลายเดือนก่อน ⁺¹
I hope you said yes
@saqqara6361 7 หลายเดือนก่อน ⁺¹
"What is the title of the document?" -> 99% of RAG pipelines fail, because there is not answer in the document as it is embedded,
@jamesbriggs 7 หลายเดือนก่อน
in that case we can try including the title in our chunk, and possibly consider different routing logic for this type of query - something that triggers when a user asks for metadata about a received document we trigger a function that identifies the document ID in previously retrieved contexts, and uses that to pull in the document metadata for the answer to be generated by the LLM
@maharun 3 หลายเดือนก่อน ⁺¹
Using the semantic chunker is giving this error even thought I'm not using cohere:
cannot import name 'EmbedResponse_EmbeddingsByType' from 'cohere.types.embed_response'
how to solve it? i have already wasted on day on it.. this is so annoying.. plz help.. :)
@jamesbriggs 3 หลายเดือนก่อน
cohere did a surprise SDK update and they are a default package in the library (we may change this) - try doing a `pip install -qU semantic-chunkers semantic-router==0.68`
more info here if needed github.com/aurelio-labs/semantic-router/issues/422
@itzuditsharma 7 หลายเดือนก่อน
I am facing the problem in my jupyter notebook as this, please help
2024-05-10 10:59:50 WARNING semantic_router.utils.logger Retrying in 2 seconds...

ต่อไป

เล่นอัตโนมัติ