The BEST Way to Chunk Text for RAG

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ธ.ค. 2024

ความคิดเห็น • 15

  • @AdamLucek
    @AdamLucek  17 วันที่ผ่านมา

    📚To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/AdamLucek/ You’ll also get 20% off an annual premium subscription! 💡

  • @treflatface
    @treflatface วันที่ผ่านมา +1

    This is probably the last guide on RAG chunking I'll ever need. So well done. Thank you for the walkthrough of the research!

  • @Qwme5
    @Qwme5 2 วันที่ผ่านมา +1

    Adam, I'm struggling to find the words to express how grateful I am for the content you share on your channel. Your ability to convey information clearly and without unnecessary speculation is truly brilliant. Thank you very much. If I were not broke, I would support you, but all I can support you with is thanking you.

    • @AdamLucek
      @AdamLucek  2 วันที่ผ่านมา +1

      The kind words are support enough! Thanks for watching!

  • @raihanahmadkhan1946
    @raihanahmadkhan1946 3 วันที่ผ่านมา

    I am trying to build an Agentic RAG Framework with tool calling for Geographic Information System (GIS) Workflows for my Master's Thesis. I spent a lot of time trying to figure out the best chunking strategy and this honesty humbled me. Semantic chunking was a very compute intensive process and theoretically it sort of made sense so I went with that. Although, I am glad that I was only prototyping anyway, and since the dataset I have is huge, this is such a relief!
    Thanks for covering this Adam! Your content has been a great help.

    • @AdamLucek
      @AdamLucek  3 วันที่ผ่านมา

      Sounds like a cool thesis! Glad I could help!

  • @davieslacker
    @davieslacker วันที่ผ่านมา

    The llm based sounds interesting but also expensive. I haven’t implemented any RAG yet but this was great food for thought for helping me know where to start! Thanks!

  • @MarcinDancewicz
    @MarcinDancewicz 3 วันที่ผ่านมา +1

    That's a great and a quite surprising overview! Thank you :)

    • @AdamLucek
      @AdamLucek  3 วันที่ผ่านมา

      Thanks for watching!

  • @jonasbieniek4320
    @jonasbieniek4320 3 วันที่ผ่านมา

    Just what i need, big thanks

  • @jimlynch9390
    @jimlynch9390 13 ชั่วโมงที่ผ่านมา

    What I would really like is to see what the code looks like to actually implement the two suggested "best" tokenizers in simple examples.

  • @NLPprompter
    @NLPprompter 3 วันที่ผ่านมา +1

    is anthropic contextual RAG also can be considered as chunking strategy?

  • @gunjanjoshi5120
    @gunjanjoshi5120 2 วันที่ผ่านมา

    This is interesting to see. Especially since multiple articles state when using recursive chunking, chunk_overlap is an important parameter to ensure context between chunks but chroma suggests otherwise. What are your thoughts on this from your RAG experience?

    • @AdamLucek
      @AdamLucek  2 วันที่ผ่านมา +1

      Overlap can be a little redundant here and there. It definitely can help when relevant but not apparent context is cut or disconnected, which is kinda what the cluster semantic chunker here is trying to solve for, but usually if your chunk sizes are big enough and your retrieval mechanism is robust, the splitting of recursive approaches based on natural separators tend to do most of the work of keeping relevant sections together when working with text data, which is only improved when introducing cosine similarity comparisons into the mix as well with the semantic approaches.

  • @60pluscrazy
    @60pluscrazy 2 วันที่ผ่านมา

    🎉🎉🎉