RAG for Complex PDFs with LlamaParse and

AI Makerspace

มุมมอง 14 018

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 11 ม.ค. 2025

ความคิดเห็น •

@AI-Makerspace 10 หลายเดือนก่อน ⁺⁵
Google Colab Notebook: colab.research.google.com/drive/1IVQkSGwS5kdTiKBwz85PO6vg_WaNx15c?usp=sharing
Event Slides: www.canva.com/design/DAF-L3KONQc/276F2Y-5Ym771I64RsjZiQ/view?DAF-L3KONQc&
@twoplustwo5 10 หลายเดือนก่อน ⁺³
🎯 Key Takeaways for quick navigation:
00:09 *🎵 Introduction and Overview of LlamaParse*
- Introduction of the hosts and the topic of the video, which is the new LlamaParse library.
- Brief discussion on the capabilities of LlamaParse, particularly its ability to parse embedded tables and figures.
02:09 *📚 Understanding LlamaParse and its Performance*
- Explanation of the purpose and functionality of LlamaParse.
- Discussion on how to build a query engine using LlamaParse for document retrieval applications.
05:07 *📈 Llama Index and its Role in Data Framework*
- Detailed explanation of Llama Index and its role as a data framework.
- Discussion on the concept of context augmentation and its importance in the data-centric paradigm.
10:54 *📊 LlamaParse's Parsing Algorithm and its Capabilities*
- Introduction to LlamaParse's proprietary parsing algorithm for documents with embedded objects.
- Discussion on the comparison of LlamaParse's performance with other parsing tools.
14:05 *🧪 Testing LlamaParse's Performance*
- Explanation of the testing process and the documents used for testing.
- Discussion on the results of the testing, highlighting the strengths and weaknesses of LlamaParse.
20:54 *💻 Demonstration of LlamaParse in Code*
- Walkthrough of the code used for testing LlamaParse.
- Explanation of the models and tools used in the testing process.
23:24 *📚 Setting up LlamaParse and Llama Index*
- Explanation of how to set up LlamaParse and Llama Index.
- Discussion on the process of generating an API key for Llama Cloud.
- Mention of the limitations of LlamaParse, such as only accepting PDFs and returning only plain text or markdown.
26:40 *🛠️ Initializing LlamaParse and Parsing Documents*
- Walkthrough of initializing LlamaParse and parsing documents.
- Explanation of the importance of preserving the structure of the data in the documents.
- Discussion on the inconsistency in the parsing process and the potential issues that may arise.
31:52 *🚀 Building a Query Engine with Llama Index v0.10*
- Introduction to Llama Index v0.10 and the changes it brings.
- Explanation of how to build a query engine using Llama Index.
- Discussion on the importance of preserving the structure of the data in the documents.
35:06 *🧪 Testing the Query Engine*
- Walkthrough of testing the query engine.
- Discussion on the results of the testing, highlighting the strengths and weaknesses of the query engine.
- Explanation of the importance of the ranker in the retrieval process.
39:21 *📊 Querying Structured Data*
- Demonstration of querying structured data using the query engine.
- Discussion on the accuracy of the results and the potential issues that may arise.
- Explanation of the importance of preserving the structure of the data in the documents.
42:52 *🎯 Testing LlamaParse on Figures and Graphs*
- Demonstration of LlamaParse's performance on figures and graphs.
- Discussion on the limitations of LlamaParse in understanding pictorial representations of data.
- Mention of the potential improvements in LlamaParse's ability to handle images in the future.
44:15 *📊 LlamaParse's Strengths and Limitations*
- Summary of LlamaParse's strengths, particularly in tabular extraction from PDFs.
- Discussion on the proprietary nature of LlamaParse and its ease of use.
- Mention of the potential improvements and developments in LlamaParse.
45:52 *💡 Q&A Session*
- Start of the Q&A session, addressing various questions about LlamaParse.
- Discussion on the potential of integrating LlamaParse with other tools and models.
- Explanation of the decision to use a recursive query engine and the benefits of this approach.
49:54 *🔄 Comparing LlamaParse with Other Tools*
- Comparison of LlamaParse with other open-source parsers.
- Discussion on the benefits of LlamaParse being integrated into the Llama Index ecosystem.
- Mention of the potential improvements and developments in LlamaParse.
51:17 *📑 Handling Tables in LlamaParse*
- Explanation of how LlamaParse handles tables and maintains their structure.
- Discussion on the limitations of LlamaParse in preserving the visual presentation of tables.
- Mention of the potential improvements and developments in LlamaParse.
53:07 *🔄 Integrating LlamaParse with Other Tools*
- Discussion on the potential of integrating LlamaParse with other tools and models.
- Explanation of the benefits of LlamaParse's output being in markdown format.
- Mention of the potential improvements and developments in LlamaParse.
54:58 *🚀 Future of LlamaParse and RAG*
- Discussion on the future of LlamaParse and RAG in the context of large context window models.
- Explanation of the benefits of RAG and its continued relevance.
- Mention of the potential improvements and developments in LlamaParse and RAG.
Made with HARPA AI
@kinanlaham744 10 หลายเดือนก่อน ⁺³
35:54 my favorite part:
"If we only miss sometimes, that's obviously much better than if we miss all the time or if we miss A LOT"
@AI-Makerspace 10 หลายเดือนก่อน
We also loved that part :)
@charleskilpatrick9704 10 หลายเดือนก่อน ⁺²
The links for notebooks and slides are very helpful as it is sometimes necessary to access this content after the initial streaming due to schedule concerns. Thank you!
@kakashisensie100 8 หลายเดือนก่อน ⁺¹
Thanks!
@AI-Makerspace 8 หลายเดือนก่อน
Thank YOU!
@AmarHarolikar. 10 หลายเดือนก่อน
Solid tutorial. Just tried out converting PDF to MD files and a few more stuff. Mind blowing potential. Thanks so much for sharing.
@AI-Makerspace 10 หลายเดือนก่อน
Thank you @Tigzig! We love to hear that you're already putting the tool to use!!
@kamitp4972 10 หลายเดือนก่อน
Thank you so much. I really needed this tutorial
@loicbaconnier9150 10 หลายเดือนก่อน ⁺²
Hi, Great video thanks
Is this parser better than the one you use in previous video ?
From pdf to html ?
Or compare to surya ?
@AI-Makerspace 10 หลายเดือนก่อน ⁺²
Yes, this is better (across the board) than the previous approach we tried.
@loicbaconnier9150 10 หลายเดือนก่อน
Did you also compare it to Surya please ?
@ddd1million 10 หลายเดือนก่อน
@@AI-Makerspace how does it compare to Langchain RAG implementations? Which one would you recommend for someone working with only html and pdf? Thank you very much :)
@AI-Makerspace 10 หลายเดือนก่อน ⁺¹
Both can be great tools! Use what you're most comfortable with!
@valentind.5398 10 หลายเดือนก่อน
Thank you for sharing a great tool once again. I'm letting you know that your LamaCloud link in the colab notebook isn't properly spelled, the letter I is missing at the end of the url.
@AI-Makerspace 9 หลายเดือนก่อน
Thank you!
@nicolassuarez2933 9 หลายเดือนก่อน
Parsing files: 50%|█████ | 1/2 [00:00
@AI-Makerspace 8 หลายเดือนก่อน
You'll need to make sure you provide a valid API key.
@vijaykumaraswamy7609 9 หลายเดือนก่อน
Here, we are questioning each pdf right?
Can we do questioning all the pdfs we have at a time? Let's say i have 10 10-k pdfs...i want to parse them store them to vector store (chromadb for example) and them do the retrieval on all the documents at a time
@AI-Makerspace 9 หลายเดือนก่อน
Yes! If you store them in the same index - you will query across all of them at once.
@vijaykumaraswamy7609 8 หลายเดือนก่อน
@@AI-Makerspace yeah I am storing all of'em on one index and my similarity score is very low 0.08 something like that.
Could you tell me the best index algorithm to apply for my case?
@NickoCorriveau 10 หลายเดือนก่อน
how do we get set up with an API key? From what I can tell it looks like the Llama cloud is limited access.
@AI-Makerspace 9 หลายเดือนก่อน
You'll need an API key for this at this time.
@mrrohitjadhav470 9 หลายเดือนก่อน
@@AI-Makerspace is there any alternative in open source models??
@AIEntusiast_ 10 หลายเดือนก่อน ⁺⁴
use finish built rag systems that clean, chunk, and meta tag your pdf`s, i did 600 in a day you get good data
@rickragv 10 หลายเดือนก่อน
❤
@c0mpuipf 9 หลายเดือนก่อน ⁺²
wwHY aRE you TALKING iN CAPS lOCK!?!
@AI-Makerspace 9 หลายเดือนก่อน
We continue to work to get our audio game DIALED 🤙

ต่อไป

เล่นอัตโนมัติ