Agentic Process Automation (APA)

AI Agents Explained: How This Changes Everything

Array Mapping For Dynamic Rendering - JSX Deep Dive (Part 04) - JSX.Design

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

HIGHLIGHTS : Singapore 2-4 Thailand | ASEAN Championship 2024 | 17.12.24

🔴Live : สิงคโปร์ พบ ไทย #MATCHDAY รวมพลัง #เชียร์ไทยให้กึกก้อง

OCR 2.0

Bot Nirvana

มุมมอง 308

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 ธ.ค. 2024
In this podcast, we dive into the new concept of OCR 2.0 - the future of OCR with LLMs.
We explore how this new approach addresses the limitations of traditional OCR by introducing a unified, versatile system capable of understanding various visual languages. We discuss the innovative GOT (General OCR Theory) model, which utilizes a smaller, more efficient language model. The podcast highlights GOT's impressive performance across multiple benchmarks, its ability to handle real-world challenges, and its capacity to preserve complex document structures. We also examine the potential implications of OCR 2.0 for future human-computer interactions and visual information processing across diverse fields.
Key Points

1. Traditional OCR vs. OCR 2.0

• Current OCR limitations (multi-step process, prone to errors)

• OCR 2.0: A unified, end-to-end approach

2. Principles of OCR 2.0

• End-to-end processing

• Low cost and accessibility

• Versatility in recognizing various visual languages

3. GOT (General OCR Theory) Model

• Uses a smaller, more efficient language model (Quinn)

• Trained in diverse visual languages (text, math formulas, sheet music, etc.)

4. Training Innovations

• Data engines for different visual languages

• E.g. LaTeX for mathematical formulas

5. Performance and Capabilities

• State-of-the-art results on standard OCR benchmarks

• Outperforms larger models in some tests

• Handles real-world challenges (blurry images, odd angles, different lighting)

6. Advanced Features

• Formatted document OCR (preserving structure and layout)

• Fine-grained OCR (precise text selection)

• Generalization to untrained languages

This episode was generated using Google Notebook LM (notebooklm.goo...) , drawing insights from the paper "General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model (arxiv.org/abs/...) ".
Stay ahead in your AI journey with Bot Nirvana AI Mastermind (botnirvana.org...) .
Podcast Transcript:
All right, so we're diving into the future of OCR today. Really interesting stuff.
Yeah, and you know how sometimes you just gain a document, you just want the text, you don't really think twice about it. Right, right. But this paper, General OCR Theory, towards OCR 2.0 via a unified end-to-end model. Catchy title. I know, right? But it's not just the title, they're proposing this whole new way of thinking about OCR. OCR 2.0 as they call it. Exactly, it's not just about text anymore. Yeah, it's really about understanding any kind of visual information, like humans do. So much bigger. It's a really ambitious goal. Okay, so before we get ahead of ourselves, let's back up for a second. Okay. How does traditional OCR even work? Like when you and I scan a document, what's actually going on? Well, it's kind of like, imagine an assembly line, right? First, the system has to figure out where on the page the actual text is. Find it. Right, isolate it. Then it crops those bits out. Okay. And then it tries to recognize the individual letters and words. So it's like a multi-step? Yeah, it's a whole process. And we've all been there, right? When one of those steps goes wrong. Oh, tell me about it. And you get that OCR output that's just… Gibberish, told gibberish. The worst. And the paper really digs into this. They're saying that whole assembly line approach, it's not just prone to errors, it's just clunky. Yeah, very inefficient. Like different fonts can throw it off. And write. Different languages, forget it. Oh yeah, if it's not basic printed text, OCR 1.0 really struggles. It's like it doesn't understand the context. Yeah, exactly. It's treating information like it's just a bunch of isolated letters, instead of seeing the bigger picture, you know, the relationships between them. It doesn't get the human ele...

ความคิดเห็น •

ต่อไป

เล่นอัตโนมัติ

Agentic Process Automation (APA)

Agentic Process Automation (APA)

AI Agents Explained: How This Changes Everything

AI Agents Explained: How This Changes Everything

Array Mapping For Dynamic Rendering - JSX Deep Dive (Part 04) - JSX.Design

Array Mapping For Dynamic Rendering - JSX Deep Dive (Part 04) - JSX.Design

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

HIGHLIGHTS : Singapore 2-4 Thailand | ASEAN Championship 2024 | 17.12.24

HIGHLIGHTS : Singapore 2-4 Thailand | ASEAN Championship 2024 | 17.12.24

🔴Live : สิงคโปร์ พบ ไทย #MATCHDAY รวมพลัง #เชียร์ไทยให้กึกก้อง

🔴Live : สิงคโปร์ พบ ไทย #MATCHDAY รวมพลัง #เชียร์ไทยให้กึกก้อง

#WOWxดราม่าคอมเม้นแฟนบอลอาเซียน ตะลึง!! แห่ชื่นชมสปิริตทีมชาติไทย หลังเกมส์พลิกชนะสิงคโปร์ 4-2

#WOWxดราม่าคอมเม้นแฟนบอลอาเซียน ตะลึง!! แห่ชื่นชมสปิริตทีมชาติไทย หลังเกมส์พลิกชนะสิงคโปร์ 4-2

Google’s Quantum Chip: Did We Just Tap Into Parallel Universes?

Google’s Quantum Chip: Did We Just Tap Into Parallel Universes?

RAG: The Secret to Making AI Work for You (Not Against You)

RAG: The Secret to Making AI Work for You (Not Against You)

Snap Judgments: Why We Judge People in Seconds and What It Says About Us

Snap Judgments: Why We Judge People in Seconds and What It Says About Us

How to Make a Real Diamond - (Not Clickbait)

How to Make a Real Diamond - (Not Clickbait)

The Mystery of the Most Dangerous Place on the Moon

The Mystery of the Most Dangerous Place on the Moon

Jordan Deersley | Revolutionizing Voice AI with Vapi | AI Agents Podcast

Jordan Deersley | Revolutionizing Voice AI with Vapi | AI Agents Podcast

Agentic AI: Why We Are Going Past ChatGPT

Agentic AI: Why We Are Going Past ChatGPT

The Latest from CERN Brian Cox Discusses the Unexpected Discoveries

The Latest from CERN Brian Cox Discusses the Unexpected Discoveries

NASA Probe Touches the Sun! Why Is It Not Melted

NASA Probe Touches the Sun! Why Is It Not Melted

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

LIVE🔴 : Cambodia vs Timor-Leste | ASEAN Championship 2024 | 17.12.24

LIVE🔴 : Cambodia vs Timor-Leste | ASEAN Championship 2024 | 17.12.24

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

หนีบ้านมากาดงัว

หนีบ้านมากาดงัว

ใครขยับไม่ได้เป็น!!

ใครขยับไม่ได้เป็น!!

ถ้าต้องทำ การบ้าน ตลอดชีวิต? คุณจะเลือกแบบไหน!

ถ้าต้องทำ การบ้าน ตลอดชีวิต? คุณจะเลือกแบบไหน!

ผู้หญิงแต่งงานกับขอทาน แต่กลับถูกดูหมิ่น ในที่สุดชายขเทานก็เผยตัวตย#ละครหวานๆ#ชอบ

ผู้หญิงแต่งงานกับขอทาน แต่กลับถูกดูหมิ่น ในที่สุดชายขเทานก็เผยตัวตย#ละครหวานๆ#ชอบ