DVS Q&A with an Expert: Jason Forrest

Windows 11 23H2 Sucks! Zen 5 KB5041587 Patch Testing + Windows 10 Comparison

Document Summaries with Small Language Models (SLIMs Cookbook)

คู่มือวิธีเล่นมายคราฟ

ใครกินเผ็ดเก่งสุด

HIGHLIGHTS : Japan 7-0 China PR | AFC Asian Qualifiers™ - Road to 26 (Round 3) | 05.09.24

Advanced Techniques for Working with Different Document Types in RAG

llmware

มุมมอง 638

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 6 ก.ย. 2024
Getting your parsing and chunking right is a key part of RAG. Go from Novice to Expert by learning advanced techniques on how to level up your parsing and chunking using a wide range of document types, including Microsoft Office Documents, including tables and OCR images.
Once all parsed, then learn how to easily create datasets from the parsed documents, including different dataset types.
Learn Advanced Techniques for:
- Working with Different Document Types using a variety of Options and Configurations; and
- Learn how to Create Datasets for model training using your documents with even Table Information and Images.
Please subscribe for future content!
Check out our Github and leave a star!
github.com/llm...
Join us in discord:
/ discord

ความคิดเห็น • 10

@JebliMohamed 3 หลายเดือนก่อน ⁺¹
Loved the video!
The step-by-step guide on parsing docs and data was super helpful.
I was really impressed by how you used OCR to pull text from images in Microsoft Office files - that was cool.
The smart chunking strategy explanation was also 👌.
@llmware 3 หลายเดือนก่อน
Thank you so much for your kind feedback! ☺
@user-kk1li5mk7q 3 หลายเดือนก่อน ⁺¹
This is really a nice way of extracting data and converting the unstructured data into structured form. I believe the data after extraction can be used as a data source for the RAG pipeline and probably LLMs can give more accurate answers.
@llmware 3 หลายเดือนก่อน ⁺¹
Thank you so much for your observation - we also believe that documents parsed in this manner will enhance accuracy of LLMs in a RAG workflow!
@JebliMohamed 3 หลายเดือนก่อน ⁺¹
🎯 Key points for quick navigation:
00:18 *📄 Introduction to document parsing, chunking, and data extraction.*
00:33 *🛠️ Advanced techniques for extracting images, tables, and automating workflows.*
01:17 *📚 Preparing datasets for self-supervised learning and fine-tuning.*
01:31 *💡 Focus on data wrangling and Microsoft Office documents.*
02:14 *🗂️ Accessing public Microsoft Word, PowerPoint, and Excel documents.*
03:22 *📂 Downloading and preparing Microsoft Office documents.*
04:03 *🛠️ Setting up the environment to parse and chunk documents.*
05:12 *🔍 Smart chunking strategies and their configurations.*
06:22 *📑 Parsing tables and images from documents.*
07:32 *🗃️ Exporting tables into CSV files.*
08:28 *🖼️ Running OCR on extracted images.*
09:54 *📄 Creating a consolidated JSONL file.*
10:35 *📊 Building a dataset for unsupervised testing.*
11:14 *⚡ Parsing 152 files in 6 seconds using a local Mac M1.*
12:37 *🔍 Running OCR and storing text in the library.*
13:17 *⏱️ Comparing the speed of digital parsing versus OCR.*
14:23 *📁 Exploring file artifacts created during parsing.*
16:29 *📄 Reviewing the created dataset.*
19:44 *🎥 Closing remarks and upcoming example videos.*
Made with HARPA AI
@llmware 3 หลายเดือนก่อน
This is so helpful - thank you!!
@user-um2uq9nh4z 3 หลายเดือนก่อน ⁺²
wow!!!!! I'm wowed!
@llmware 3 หลายเดือนก่อน
Thank you so much! 🥰
@user-cb7yl4nr6h 2 หลายเดือนก่อน ⁺¹
Download the repo, open the example, and just run it and it will work, because I tried in Kolab and the example did not work for me
@llmware 2 หลายเดือนก่อน
We are working on turning many of our YT videos into Colab notebooks as well and will post these notebooks as we make them.

ต่อไป

เล่นอัตโนมัติ

DVS Q&A with an Expert: Jason Forrest

DVS Q&A with an Expert: Jason Forrest

Windows 11 23H2 Sucks! Zen 5 KB5041587 Patch Testing + Windows 10 Comparison

Windows 11 23H2 Sucks! Zen 5 KB5041587 Patch Testing + Windows 10 Comparison

Document Summaries with Small Language Models (SLIMs Cookbook)

Document Summaries with Small Language Models (SLIMs Cookbook)

คู่มือวิธีเล่นมายคราฟ

คู่มือวิธีเล่นมายคราฟ

ใครกินเผ็ดเก่งสุด

ใครกินเผ็ดเก่งสุด

HIGHLIGHTS : Japan 7-0 China PR | AFC Asian Qualifiers™ - Road to 26 (Round 3) | 05.09.24

HIGHLIGHTS : Japan 7-0 China PR | AFC Asian Qualifiers™ - Road to 26 (Round 3) | 05.09.24

ถ้าทำตัวแบบนี้ออกไปเลย #หกฉากครับจารย์

ถ้าทำตัวแบบนี้ออกไปเลย #หกฉากครับจารย์

Generate Summaries with Topic Focus using CPU-friendly Model SLIM

Generate Summaries with Topic Focus using CPU-friendly Model SLIM

Extract and Visualize Data from PDF Tables with PDFplumber in Python

Extract and Visualize Data from PDF Tables with PDFplumber in Python

Voice Transcription with CPU Friendly AI Models Example (Greatest Speeches of 20th Century)

Voice Transcription with CPU Friendly AI Models Example (Greatest Speeches of 20th Century)

The Hardest Problem in RAG - what to do with NOT FOUND answers

The Hardest Problem in RAG - what to do with NOT FOUND answers

Multi-Model Bot for Business Docs Using Small Models

Multi-Model Bot for Business Docs Using Small Models

Just Happened! Elon Musk Announces Tesla Bot Optimus Next Gen 3 Securities, Durability & Production!

Just Happened! Elon Musk Announces Tesla Bot Optimus Next Gen 3 Securities, Durability & Production!

The Emptiness Machine (Official Music Video) - Linkin Park

The Emptiness Machine (Official Music Video) - Linkin Park

หนังเต็มเรื่อง | ยุทธการหฤโหด | หนังสงคราม หนังแอคชั่น | พากย์ไทย HD

หนังเต็มเรื่อง | ยุทธการหฤโหด | หนังสงคราม หนังแอคชั่น | พากย์ไทย HD

คุณจะต้องไม่เชื่อแน่ๆ ว่าลูกโป่งจะทำออกมาเป็นสิ่งนี้ได้ #negi #diy

คุณจะต้องไม่เชื่อแน่ๆ ว่าลูกโป่งจะทำออกมาเป็นสิ่งนี้ได้ #negi #diy

เล่นพลาดเท่ากับ #shorts

เล่นพลาดเท่ากับ #shorts

Cute kitty gadget 💛💕

Cute kitty gadget 💛💕

เบ็นเท็น ( Ben10 Reboot ) เต็มเรื่อง | ตอน 91 | MrBoom

เบ็นเท็น ( Ben10 Reboot ) เต็มเรื่อง | ตอน 91 | MrBoom

📱🪢 Mom's Wild Lesson: Phone Tied to Thread! See What Happens Next! 😱 #reaction #cats #funny #prank

📱🪢 Mom's Wild Lesson: Phone Tied to Thread! See What Happens Next! 😱 #reaction #cats #funny #prank

也不知道短短的几分钟闯了多大的祸，能连人带行李的被扔出来！ #funny #funnybaby #comedy #cutebaby #cute

也不知道短短的几分钟闯了多大的祸，能连人带行李的被扔出来！ #funny #funnybaby #comedy #cutebaby #cute