[79] PDF Invoice Data - GenAI Image Recognition vs Text Extraction!

Google Data Center 360° Tour

pyspark Asc and Desc

#SAVEอโมริมผีเจ๊าอิปสวิชสุดกร่อยหงส์พลิกชีวิตรัวแซงนักบุญ | 3ซี้ขยี้บอล | EP.13 | Siamsport

ILLSLICK - KILLSHOT REMIX

ยอด ดีเลิศ อะไรใหม่ ๆ สับ ๆมหัศจรรย์ ที่สุด❤️‍🔥🫵🏻 #4EVE #SkyTrainMusicFest

[77] Use Selenium and Pandas on Google Colab to access rendered HTML tables!

Pythonic Accountant

มุมมอง 378

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 พ.ย. 2024

ความคิดเห็น •

@roons2424 6 หลายเดือนก่อน ⁺¹
Question from a aspiring data scientist:
How did you go about finding and fixing the problem that occourd in the last video?
When it didn't work for me I felt hopeless and like my project wont continiue for a long time like the last time I ran into a problem.
How did you manage to find an alternative/working way?
I am very impressed by your skills of not only finding and fixin the problem but also putting in effort of making it avalible for everyone and even going about making an instructional video explaining it. Today I'll celebrate you my friend.
Sorry for my bad English,
Greets from the Netherlands
@PythonicAccountant 6 หลายเดือนก่อน ⁺¹
Thank you for your message and watching my videos! So short answer is previously I would’ve just googled it, but now I typically will go to ChatGPT! The first thing I will do is copy the error message and paste it in to see if it has any suggestions. I realized quickly that it was likely Just the perimeter is getting passed in and tried removing the first perimeter as one of the suggestions from ChatGPT had indicated.
As far as your comments about feeling hopeless when something goes wrong, that’s totally normal! For me I am pretty stubborn and also a bit of a hacker mindset, so I usually Assume there is another way to do something and will work very hard and try many different things before I would eventually give up. Having that level of commitment and trying over and over again usually end up working out well for me, but I also usually learn many things along the way. Try it out! Next time you run into a problem try solving it and a bunch of different ways before you give up :-)
@ryzvonusef 6 หลายเดือนก่อน ⁺¹
thank you!
@PythonicAccountant 6 หลายเดือนก่อน ⁺¹
You're welcome!
@joepropertykey3612 5 หลายเดือนก่อน ⁺¹
The reality is, most pages that have 'in demand' data in tables are using heavy javascripting on the page . 'Selenium-Wire' (different from just 'Selenium')will catch the important parts though.
pandas will read the html tables too. Rips right through it.
@PythonicAccountant 5 หลายเดือนก่อน
Thanks I haven’t heard of selenium wire! I’ll check it out
@joepropertykey3612 5 หลายเดือนก่อน
@@PythonicAccountant When you use 'Selenium-Wire, it's catching all of the 'network responses' in the background
. If you google 'selenium-wire network scrape data' you'll see how to find the data in a specific response url, (usually stored as neat an tidy as json) but other times it can be html in the response. But for those 'dynamic tables and pages? Yessir. Selenium-Wire.
I used to follow you a lot 'before AI', with your pdfplumber videos.
I've been noticing if you can get pymupdf to extract text, and also 'preserve the line spacing' from a pdf? It's pretty easy to use pandas to go through the text results, and use the line spacing to ''data map' , mapping rows, and line positions on those rows, to columns and rows on another temp df
When you look at 'most' pdf's, it's almost as if there are 3 'columns' of data going down down the middle of the page, where there is not overflow from one column to the next..... this is where you can get it into a form to go to town with pandas and data mapping
@joepropertykey3612 5 หลายเดือนก่อน
@@PythonicAccountant Have you ah.... 'saved a pdf to an html file' and tried to parse that with pandas yet? It's kind of an interesting workaround, whe you have unstructured data, and possibly a pdf what was created with any indexing. You can drop the html file into chatgpt too, and tell it what selectors you want, and it will scrape tables out = give you the pandas code to parse things out.
@PythonicAccountant 5 หลายเดือนก่อน
@@joepropertykey3612 that’s interesting, does it actually work?
@joepropertykey3612 5 หลายเดือนก่อน
@@PythonicAccountant pandas reading html from a pdf? yessir. It's just another tool in the pdf arsenal...if I can't do it with pdfplumber simply, I look at pymupdf, and how I can convert the data to something more simple and structured to parse.

ต่อไป

เล่นอัตโนมัติ

[79] PDF Invoice Data - GenAI Image Recognition vs Text Extraction!

[79] PDF Invoice Data - GenAI Image Recognition vs Text Extraction!

Google Data Center 360° Tour

Google Data Center 360° Tour

pyspark Asc and Desc

pyspark Asc and Desc

#SAVEอโมริมผีเจ๊าอิปสวิชสุดกร่อยหงส์พลิกชีวิตรัวแซงนักบุญ | 3ซี้ขยี้บอล | EP.13 | Siamsport

#SAVEอโมริมผีเจ๊าอิปสวิชสุดกร่อยหงส์พลิกชีวิตรัวแซงนักบุญ | 3ซี้ขยี้บอล | EP.13 | Siamsport

ILLSLICK - KILLSHOT REMIX

ILLSLICK - KILLSHOT REMIX

ยอด ดีเลิศ อะไรใหม่ ๆ สับ ๆมหัศจรรย์ ที่สุด❤️‍🔥🫵🏻 #4EVE #SkyTrainMusicFest

ยอด ดีเลิศ อะไรใหม่ ๆ สับ ๆมหัศจรรย์ ที่สุด❤️‍🔥🫵🏻 #4EVE #SkyTrainMusicFest

กินแปลกประเทศจีน สตรีทฟู้ดฉงชิ่ง 24 ชั่วโมง BANKII 8K

กินแปลกประเทศจีน สตรีทฟู้ดฉงชิ่ง 24 ชั่วโมง BANKII 8K

Learning Pandas for Data Analysis? Start Here.

Learning Pandas for Data Analysis? Start Here.

[90] Can GenAI do Accounting? See how the best models do!

[90] Can GenAI do Accounting? See how the best models do!

Selenium in Google Colab Tutorial For Beginners: Web scraping To Google Sheets

Selenium in Google Colab Tutorial For Beginners: Web scraping To Google Sheets

Convert Trapped Tables within PDFs to Pandas DataFrames

Convert Trapped Tables within PDFs to Pandas DataFrames

Программисты-самоучки... Слушайте внимательно.

Программисты-самоучки... Слушайте внимательно.

I Tried Every AI Coding Assistant

I Tried Every AI Coding Assistant

I Used AI To Build This $900K/mo App In A Day

I Used AI To Build This $900K/mo App In A Day

The Value of Source Code

The Value of Source Code

The Secret Science of Perfect Spacing

The Secret Science of Perfect Spacing

ILLSLICK - KILLSHOT REMIX

ILLSLICK - KILLSHOT REMIX

🔴LIVE เชียร์สด : เลสเตอร์ ซิตี้ พบ เชลซี | จิ้งจอกสยามดวลสิงโตน้ำเงินคราม MW12

🔴LIVE เชียร์สด : เลสเตอร์ ซิตี้ พบ เชลซี | จิ้งจอกสยามดวลสิงโตน้ำเงินคราม MW12

Smart Sigma Kid #funny #sigma

Smart Sigma Kid #funny #sigma

Scum Rangers LIVE-008 วันดีๆกับพรี่สเตฟาน

Scum Rangers LIVE-008 วันดีๆกับพรี่สเตฟาน

หาทำ EP.54 : ลาบปลาทับทิมทอดครั้งแรก ของ "เจ๊มิ่ง" | จือปาก

หาทำ EP.54 : ลาบปลาทับทิมทอดครั้งแรก ของ "เจ๊มิ่ง" | จือปาก

I Exposed The World’s Most DANGEROUS Theme Parks!

I Exposed The World’s Most DANGEROUS Theme Parks!

ตอบถูกต้องแต่ไม่ถูกใจ #aum_ccp #shorts

ตอบถูกต้องแต่ไม่ถูกใจ #aum_ccp #shorts

ถูกจัดฉากสร้างเรื่อง ตกหลุมพรางมารศาสนา! | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

ถูกจัดฉากสร้างเรื่อง ตกหลุมพรางมารศาสนา! | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก