RAG for Complex PDFs with LlamaParse and

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ม.ค. 2025

ความคิดเห็น •

  • @AI-Makerspace
    @AI-Makerspace  10 หลายเดือนก่อน +5

    Google Colab Notebook: colab.research.google.com/drive/1IVQkSGwS5kdTiKBwz85PO6vg_WaNx15c?usp=sharing
    Event Slides: www.canva.com/design/DAF-L3KONQc/276F2Y-5Ym771I64RsjZiQ/view?DAF-L3KONQc&

  • @twoplustwo5
    @twoplustwo5 10 หลายเดือนก่อน +3

    🎯 Key Takeaways for quick navigation:
    00:09 *🎵 Introduction and Overview of LlamaParse*
    - Introduction of the hosts and the topic of the video, which is the new LlamaParse library.
    - Brief discussion on the capabilities of LlamaParse, particularly its ability to parse embedded tables and figures.
    02:09 *📚 Understanding LlamaParse and its Performance*
    - Explanation of the purpose and functionality of LlamaParse.
    - Discussion on how to build a query engine using LlamaParse for document retrieval applications.
    05:07 *📈 Llama Index and its Role in Data Framework*
    - Detailed explanation of Llama Index and its role as a data framework.
    - Discussion on the concept of context augmentation and its importance in the data-centric paradigm.
    10:54 *📊 LlamaParse's Parsing Algorithm and its Capabilities*
    - Introduction to LlamaParse's proprietary parsing algorithm for documents with embedded objects.
    - Discussion on the comparison of LlamaParse's performance with other parsing tools.
    14:05 *🧪 Testing LlamaParse's Performance*
    - Explanation of the testing process and the documents used for testing.
    - Discussion on the results of the testing, highlighting the strengths and weaknesses of LlamaParse.
    20:54 *💻 Demonstration of LlamaParse in Code*
    - Walkthrough of the code used for testing LlamaParse.
    - Explanation of the models and tools used in the testing process.
    23:24 *📚 Setting up LlamaParse and Llama Index*
    - Explanation of how to set up LlamaParse and Llama Index.
    - Discussion on the process of generating an API key for Llama Cloud.
    - Mention of the limitations of LlamaParse, such as only accepting PDFs and returning only plain text or markdown.
    26:40 *🛠️ Initializing LlamaParse and Parsing Documents*
    - Walkthrough of initializing LlamaParse and parsing documents.
    - Explanation of the importance of preserving the structure of the data in the documents.
    - Discussion on the inconsistency in the parsing process and the potential issues that may arise.
    31:52 *🚀 Building a Query Engine with Llama Index v0.10*
    - Introduction to Llama Index v0.10 and the changes it brings.
    - Explanation of how to build a query engine using Llama Index.
    - Discussion on the importance of preserving the structure of the data in the documents.
    35:06 *🧪 Testing the Query Engine*
    - Walkthrough of testing the query engine.
    - Discussion on the results of the testing, highlighting the strengths and weaknesses of the query engine.
    - Explanation of the importance of the ranker in the retrieval process.
    39:21 *📊 Querying Structured Data*
    - Demonstration of querying structured data using the query engine.
    - Discussion on the accuracy of the results and the potential issues that may arise.
    - Explanation of the importance of preserving the structure of the data in the documents.
    42:52 *🎯 Testing LlamaParse on Figures and Graphs*
    - Demonstration of LlamaParse's performance on figures and graphs.
    - Discussion on the limitations of LlamaParse in understanding pictorial representations of data.
    - Mention of the potential improvements in LlamaParse's ability to handle images in the future.
    44:15 *📊 LlamaParse's Strengths and Limitations*
    - Summary of LlamaParse's strengths, particularly in tabular extraction from PDFs.
    - Discussion on the proprietary nature of LlamaParse and its ease of use.
    - Mention of the potential improvements and developments in LlamaParse.
    45:52 *💡 Q&A Session*
    - Start of the Q&A session, addressing various questions about LlamaParse.
    - Discussion on the potential of integrating LlamaParse with other tools and models.
    - Explanation of the decision to use a recursive query engine and the benefits of this approach.
    49:54 *🔄 Comparing LlamaParse with Other Tools*
    - Comparison of LlamaParse with other open-source parsers.
    - Discussion on the benefits of LlamaParse being integrated into the Llama Index ecosystem.
    - Mention of the potential improvements and developments in LlamaParse.
    51:17 *📑 Handling Tables in LlamaParse*
    - Explanation of how LlamaParse handles tables and maintains their structure.
    - Discussion on the limitations of LlamaParse in preserving the visual presentation of tables.
    - Mention of the potential improvements and developments in LlamaParse.
    53:07 *🔄 Integrating LlamaParse with Other Tools*
    - Discussion on the potential of integrating LlamaParse with other tools and models.
    - Explanation of the benefits of LlamaParse's output being in markdown format.
    - Mention of the potential improvements and developments in LlamaParse.
    54:58 *🚀 Future of LlamaParse and RAG*
    - Discussion on the future of LlamaParse and RAG in the context of large context window models.
    - Explanation of the benefits of RAG and its continued relevance.
    - Mention of the potential improvements and developments in LlamaParse and RAG.
    Made with HARPA AI

  • @kinanlaham744
    @kinanlaham744 10 หลายเดือนก่อน +3

    35:54 my favorite part:
    "If we only miss sometimes, that's obviously much better than if we miss all the time or if we miss A LOT"

    • @AI-Makerspace
      @AI-Makerspace  10 หลายเดือนก่อน

      We also loved that part :)

  • @charleskilpatrick9704
    @charleskilpatrick9704 10 หลายเดือนก่อน +2

    The links for notebooks and slides are very helpful as it is sometimes necessary to access this content after the initial streaming due to schedule concerns. Thank you!

  • @kakashisensie100
    @kakashisensie100 8 หลายเดือนก่อน +1

    Thanks!

  • @AmarHarolikar.
    @AmarHarolikar. 10 หลายเดือนก่อน

    Solid tutorial. Just tried out converting PDF to MD files and a few more stuff. Mind blowing potential. Thanks so much for sharing.

    • @AI-Makerspace
      @AI-Makerspace  10 หลายเดือนก่อน

      Thank you @Tigzig! We love to hear that you're already putting the tool to use!!

  • @kamitp4972
    @kamitp4972 10 หลายเดือนก่อน

    Thank you so much. I really needed this tutorial

  • @loicbaconnier9150
    @loicbaconnier9150 10 หลายเดือนก่อน +2

    Hi, Great video thanks
    Is this parser better than the one you use in previous video ?
    From pdf to html ?
    Or compare to surya ?

    • @AI-Makerspace
      @AI-Makerspace  10 หลายเดือนก่อน +2

      Yes, this is better (across the board) than the previous approach we tried.

    • @loicbaconnier9150
      @loicbaconnier9150 10 หลายเดือนก่อน

      Did you also compare it to Surya please ?

    • @ddd1million
      @ddd1million 10 หลายเดือนก่อน

      ​@@AI-Makerspace how does it compare to Langchain RAG implementations? Which one would you recommend for someone working with only html and pdf? Thank you very much :)

    • @AI-Makerspace
      @AI-Makerspace  10 หลายเดือนก่อน +1

      Both can be great tools! Use what you're most comfortable with!

  • @valentind.5398
    @valentind.5398 10 หลายเดือนก่อน

    Thank you for sharing a great tool once again. I'm letting you know that your LamaCloud link in the colab notebook isn't properly spelled, the letter I is missing at the end of the url.

  • @nicolassuarez2933
    @nicolassuarez2933 9 หลายเดือนก่อน

    Parsing files: 50%|█████ | 1/2 [00:00

    • @AI-Makerspace
      @AI-Makerspace  8 หลายเดือนก่อน

      You'll need to make sure you provide a valid API key.

  • @vijaykumaraswamy7609
    @vijaykumaraswamy7609 9 หลายเดือนก่อน

    Here, we are questioning each pdf right?
    Can we do questioning all the pdfs we have at a time? Let's say i have 10 10-k pdfs...i want to parse them store them to vector store (chromadb for example) and them do the retrieval on all the documents at a time

    • @AI-Makerspace
      @AI-Makerspace  9 หลายเดือนก่อน

      Yes! If you store them in the same index - you will query across all of them at once.

    • @vijaykumaraswamy7609
      @vijaykumaraswamy7609 8 หลายเดือนก่อน

      @@AI-Makerspace yeah I am storing all of'em on one index and my similarity score is very low 0.08 something like that.
      Could you tell me the best index algorithm to apply for my case?

  • @NickoCorriveau
    @NickoCorriveau 10 หลายเดือนก่อน

    how do we get set up with an API key? From what I can tell it looks like the Llama cloud is limited access.

    • @AI-Makerspace
      @AI-Makerspace  9 หลายเดือนก่อน

      You'll need an API key for this at this time.

    • @mrrohitjadhav470
      @mrrohitjadhav470 9 หลายเดือนก่อน

      @@AI-Makerspace is there any alternative in open source models??

  • @AIEntusiast_
    @AIEntusiast_ 10 หลายเดือนก่อน +4

    use finish built rag systems that clean, chunk, and meta tag your pdf`s, i did 600 in a day you get good data

  • @rickragv
    @rickragv 10 หลายเดือนก่อน

  • @c0mpuipf
    @c0mpuipf 9 หลายเดือนก่อน +2

    wwHY aRE you TALKING iN CAPS lOCK!?!

    • @AI-Makerspace
      @AI-Makerspace  9 หลายเดือนก่อน

      We continue to work to get our audio game DIALED 🤙