Benchmarking Methods for Semi-Structured RAG

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ธ.ค. 2023

ความคิดเห็น • 7

  • @DreamsAPI
    @DreamsAPI 7 หลายเดือนก่อน +1

    Thank you for sharing, looking forward to more insightful videos from you guys Thank you

  • @mrchongnoi
    @mrchongnoi 7 หลายเดือนก่อน +1

    Thank you for your video. What I got out of your presentation is documents typically follow a semantic structure, which includes elements like a Table of Contents (TOC), sections, and sub-sections. Sections and sub-sections are closely related in terms of content and meaning. Therefore, they should be grouped together in the same chunk. Then you have relevant text regarding tables within . Tables should be part of the section/sub-section chunk or it separated point back to the section chunk. Just thinking out loud. Thank you again. I look forward to more videos.

  • @roberth8737
    @roberth8737 7 หลายเดือนก่อน

    Awesome coverage - tables in pdf´s is a huge pain point that i suspect will have a strong solution in the next 3-4 months

  • @b0otable
    @b0otable 7 หลายเดือนก่อน +2

    Thanks for sharing Lance! Do you have the input docs used for this testing?
    I'm curious how dependent it is on the input dataset. It seems like a lot of the assumption is based on page boundaries being a good place to split on, but that seems like it would be the case for carefully hand crafted documents. If you are just exporting a document to a pdf and the table spans page boundaries, I'm wondering how well the page/boundary & Ensemble methods work.

  • @raminzandvakili7418
    @raminzandvakili7418 7 หลายเดือนก่อน +1

    Thank you. This is interesting. Could you please share the code for your benchmark as well?

  • @bsarel
    @bsarel 7 หลายเดือนก่อน

    Illuminating 🙏🏼
    Not sure what’s in your eval set but I wonder what would be the best strategy for docs like confluence or Google docs where you’re less likely to have the concept of pagination except for printing layout. It’s unexpected where a table would be located when ingesting docs.

  • @Canna_Science_and_Technology
    @Canna_Science_and_Technology 7 หลายเดือนก่อน +1

    Curious, does LangChain have or support Hyde for RAG.