Extract Tables from PDF and convert to Excel sheet with Paddle OCR text detection and recognition.
ฝัง
- เผยแพร่เมื่อ 1 ก.ค. 2024
- The Paddle OCR project contains many OCR deep learning models, going from text detection, text recognition, text angle detection and table layout. In this course, we shall make use of already pretrained text detection and text recognition models to extract text found tables in a PDF. From this extracted text, we shall reconstruct the table, based on the text positions.
Enjoy!!!
Hi,
You can use this Link to access our premium courses. You'll be able to build and deploy more than 20 different AI projects.
neuralearn.podia.com/?coupon=... (30 days money back guarantee)
[Please check your mail 5 minutes after requesting access ]
Colab Notebook: [Please Check Your Mail inbox and spam 5 Minutes after Demanding Access]
colab.research.google.com/dri...
Check out:
Course on Deep Learning for Computer Vision: neuralearn.ai/course/computer...
Course on Deep Learning for Natural Language Processing: neuralearn.ai/course/natural_...
Course on Deep Learning for Image Generation: neuralearn.ai/course/image_ge...
Course on Deep Learning for Object Detection: neuralearn.ai/course/object_d...
Connect with us here:
Twitter: / neulearndotai
Facebook: link / neuralearnai-107372484...
LinkedIn / neuralearn
Mail: team@neuralearn.ai - วิทยาศาสตร์และเทคโนโลยี
Impressive content for Deep Learning OCR! Many thanks!
You're welcome :)
Thank you very much for this! Very insightful!
Glad it was helpful :)
Thanks for yor great toturial!!!!
brilliant work!!, I would like to thank you for giving me access to notebook.
keep going broo 💙💙
My Pleasure :)
Feel free to check out on our other videos
impressive, struggling right now for my little side project using ocr, u helped a lot man, appreciate it
hii does this notebook working for you actually for me it's not can u please help
@@AmanChauhan-hr1wh well i just use his method, not totally copy from him, the result i implemented by myself is not really 100% correct so i end it up by using the azure api, it's really 100% correct and the speed of processing is so fast as well
thank you man the best who explain what it is actually happening thank you so much
You're welcome:)
Broo, this is awesome, thank you very much!!!
You're welcome :)
Well that is a very simple and readable table, it's easy enough to do it with basic if logic....but try a no border , very near to border content , on a scanned image of a table
Thank you for the tutorial !!!
The pleasure is ours :)
Hi, you have done a phenomenol job, by explaining PaddleOCR in detail. Can you please let me know if we can do the training of PaddleOCR on custom datasets for extracting data from tables of different length in pdfs or images.
Excellent video 🔥
Glad you loved it :)
Hi, Neuralearn, Thanks for creating great tutorial. Its very useful. Can you please provide notebook access ?
Thank you for the tutorial, I have requested the notebook access
Please check your mail :)
Hi, I've followed your procedure as is but I'm getting "ValueError: Can't convert Python sequence with mixed types to Tensor." on the Non-Max Suppression portion. Can you tell me what might be causing that please?
amazing vid!!!!
Glad you enjoyed it :)
More on the way!!!
Awesome 👍👍👍
Thanks 🤗
Hello neuralearn, thanks for your great tutorial.
Could you please proivide notebook access
Hi, Neuralearn, Thanks for creating a very useful tutorial. Can you please provide notebook access for my study?
Hi. The content is very impressive. Would love to see the notebook and add upon this to create table in google docs instead. Please share the notebook
Hey, Thanks for the wonderful tutorial man! can you please provide access to the notebook please.
Bro you're doing good work
Thanks for the kind words :)
@@neuralearn I have a question I've pdf file which is 560 pages long and which has data that other libraries do convert into excel file but its like garbage. If I use this model i'll be able to convert it?
I think you should just go ahead and try. Its free :)
Hi Thank you for this, can youj please help me with the notebook access please, also can you please help me understand will I be able to cover most of the table formats through this?
Great.
Is it possible to use this model for matrix recognition ? how many rows and columns, elements of matrix ?
how i can fix this error "ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory" ?
caused by the line of code "import layoutparser as lp"
This is first that I hear about PaddleOCR. Seems very good tool. I really appreciate the work you have done and would also want to try this. Can you please allow access to the google collab code for this?
Hello my dear Robert
Please check your mail
Super!!!
😊
Hello , Thanks for sharing this vedio, is this method will work for nested tables?
Congrats, one of the best videos I've seen on this topic! Could you please grant me access to the Google Collab?
Please after requesting access, check your mail inbox or spam
hello do you have any idea about packaging paddle ocr. Im trying to make a exe of my code but i keep facing errors. anyhelp would be helpful
Hello @neuralearn - love the demo! Can you provide me access to the Colab?
Done!
Thank you so much, I really appreciate the informative video. Could you please allow access to google collab? It would be super helpful.
Hello my dear Quốc, please check your mail :)
Request access for colab notebook, thank you so much.
What if it does detect the table as table but as figure or text ?
hi! tysm for the video. would you pls allow access to the notebook? ty!!
pls give access to notebook ...great and informative tutorial !!
Please check your mail :)
Amazing job! Could you please share with me the google Collab? 🙏
Hello, I'm facing trouble when there are multiple lines within the same row, it is considering them as new rows.. how do i fix this?. Thank you!
Hey, can you please provide the link for the pdf used in the video?
Thanks
Hi Nice Explanation, Can you provide access.It's very helpfull for us.
EXCELLENT!
CAN YOU PLS POST A VIDEO ON Paddle OCR custom training (both detection +recognition)steps? I have my own data ..want to do a transfer learning
We are glad this was helpful :)
We shall work on that and publish as soon as possible!
@@neuralearn glad you responded..waiting for the custom training video
Very informative tutorial. I really appreciate the work you have done with this code. I also want to try this. Can you please allow access to the google collab code for this?
hello my dear Adil, Please check your mail :)
Hi, I'm getting this error - (External) CUDA error(100), no CUDA-capable device is detected.
[Hint: 'cudaErrorNoDevice'. This indicates that no CUDA-capable devices were detected by the installed CUDA driver. ] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:66).
Can you help me out w this please?
how to intall layout parser ? from the github now it doesn't have any file such as layout parser
Hello, thank you for the tutorial !! Can I get the code please ??
I want convert CSV file into Json file, { field 1: {col1:text, col2:text, col3:text},{field2:{col1:text,col2:text, col3:text} in this format. Can you please help me to create this Json file. Thank You
I have a question that if I have a table but it's in 2 pages (half of it is in 1st page and the other is in 2nd page), how could I solve this problem
Awesome video and interesting approach towards the problem , would you mind giving me access to that notebook..?
Hello my dear Harshith, please check your mail :)
Thank you so much for this video , Could you please allow access to google collab?
Hello my dear Youssef glad this video is helpful :)
Please check your mail inbox or spam
Hi, Can you please provide the notebook access?
Hi, if we have multiple tables (huge tables) then this method will work?
Yes, it should work. I think it's best to try it for yourself :)
Why do we need to clone paddle repository at 15:57
This was an amazing tutorial ! I really want to try and further tweak this. Can you please grant me access to the Google Colab Code?
Hello please check your mail inbox or spam
Please approve the access request for the Google Collab notebook. I am very interested in the code
Can I get code? I followed video and wrote code and everything is working but due to some issue, out_array at end is same value.
Update: Solved
Thanks, this is best tutorial on this topic (saying this after going through countless tutorials, research papers and blogs in past 3 months).
:)
thank you for the explanation @Neuralearn , can u please provide me access to the colab ?
Please check your mail inbox or spam :)
Hi, great video. Can you please provide access to this notebook? Thanks a lot in advance.
Hi,
check your mail box or spam
how can i use paddle ocr for receipts ?
Can I have the access of your Colab Notebook please? I have requested for the access yesterday
Hi,
check your mail box or spam
Hey! I want to try out your tutorial. Could you please give access of your notebook
Hello check your mail :)
I am getting the following error and not sure how can I resolve this:
Error: Can not import paddle core while this file exists: /usr/local/lib/python3.10/dist-packages/paddle/fluid/libpaddle.so
Tried reinstalling paddlepaddle but that didn't work.
Sir issue solved?
Hey Great Work , can you give access to your Colab Drive ?
Thanks
Please check your mail :)
amazing video..! very helpful ..! could you please provide source code?
@neuralearn hello could you indicate me where is test.pdf file?? I have access to de notebook but it throws error
I got:
PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file '/content/bahdanau attention.pdf': No such file or directory
it's fascinating. would you mind giving me the access to the colab code?
Hello my dear Steve. Please check your mail :)
It was excellently explained. I wanted to try it out but got many errors. So, Could you please grant me access to the google Colab code?
Done!
Hey, Really Great Video ❤, can u provide access to notebook
Hello my dear Kumar,
Please check your mail inbox or spam
Hi, please could you provide me with the access to this colab notebook
Hello my dear Ayush,
Please check your mail inbox or spam
Hi, could you grant me access to the notebook please?
can you please give me the access to notebook?
Thanks for this video, let's say we have a page with free text and tables, once we have our tables, how can we extract the remaining text ? when im using parser it also extract the table text from the page. i want to use your approche for tables and i want to extract only the remaining text.
for only extracting texts use pdfminer
just match the consecutive text from the table and parse the PDFs skipping over the text
Excellent tutorial, can you please access grant for google colab notebook :)
Sure:) Check your mail!
Hi, I want to get only paragraph text without any figure and table from any type pdf. How can I solve this?
You can pick text by changing [if l.type == 'Table':] ----to --> [if l.type == 'Text:]
Hi Neuralearn team, can u please provide me the google colab code access
Thank you. Please can you grant me access to notebook?
Please check your mail :)
Very informative video. Can you please share the code with me ? It would be very helpful.
Thank you so much for your tutorial! Can you please grant me access to the Google Colab Code?
Yes sure!
Amazing tutorial, is this code available for use? I would appreciate it!
Please check your mail :)
Hey there! it is a wonderful video on how to work with ocr and table. i have requested for notebook access could you please provide me with the access? thank you once again for this tutorial
hello my dear Snehal, please check your mail :)
@@neuralearn dear team. I have not yet received the confirmation. It's the same email as the one I'm replying with.
This is very informative tutorial! Could you please give me access to the Google Colab Code?
Hi my dear Amila
Please check your mail inbox or spam :)
Hello can I please get viewing access to the colab notebook?
hello Kane, please demand access and check your mail in 5 minutes
Please provide access to this notebook
Access granted!
How to get access to your notebook?
Done!
Hi, It is very interesting and to me. I really want to try this out. Could you please grant me access to the google colab code?
Done!
which python version ?
Hello.... if their is unstructure table which is not in an order of n*m dimension cell.....then this method will work?
It depends on the table in question. Nonetheless, you can always modify this method to suit your specific table
@@neuralearn It is possible to work in all types of table at one short?
No, it's not possible!
@@neuralearn Thank you soo much
You're welcome :)
Hi,i am installed paddlepaddle instead of paddlepaddle-gpu bcoz i dont have gpu in my local system. I getting "AttributeError: module 'numpy' has no attribute 'int'".
Is it possible to run this project in local system without gpu.
I facing this error too...☹
Hello my dear Dinaharan, here is a notebook which works for cpu runtime: colab.research.google.com/drive/1vZHrahaaubhWMz83jlPuvA1na_v98fUP
Hi, I am very happy to get your rply and wonder of your help.I am glad to have youtuber like you. I really liked your efforts for your subscribers. Thank you very much. 🤗😇👏👏👏
My pleasure :)
hi , not able run this code in jupyter notebook , may u help to run this in local system, like what was the procedure for that
Hello my dear Rana, what issues did you face, while running the code locally?
@@neuralearn it asking for gpu as a requirement, but i want to run this code on jupyter notebook with cpu
May be adding download links would have been more helpful,
Please check your mail :)
Thanks for the tutorial!
I've been doing all the same, but I'm getting an error with "horiz_out = tf.image.non_max_suppression... etc"
I get this:
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__NonMaxSuppressionV3_device_/job:localhost/replica:0/task:0/device:CPU:0}}onMaxSuppressionV3_device_/job:localhost/replica:0/task:0/device:CPU:0}} scores has incompressionV3]atible shape (Dimensions must be equal, but are 1 and 87) [Op:NonMaxSuppressionV3]
Does anybody had this error and knows how to solve it?
Thanks!
Hello Agus, we've updated the notebook. Check it out and let us know :)
This tutorial is very helpful and informative . Can you share this code with me ?
Hi,
check your mail box or spam
Hello great tutorial please give me access for code
Thank you! can i get the code? could you please provide access?
Yes, sure
@Neuralearn Brother can you please grant me access to google collab?
hello my dear Salman, Please check your mail :)
What an amazing work. This will be a great tutorial for me in this area of work. I’m trying to access google colab notebook. Could you please grant me permission to access google colab notebook?
Glad it was helpful!
Please check your mail
@@neuralearn I am not able to access the notebook. I requested for access.
Please check your mail again
Thanks a lot
You're welcome
I'm getting error in loading the model
ValueError: (InvalidArgument) Device id must be less than GPU count, but received id is: 0. GPU count is: 0.
[Hint: Expected id < GetGPUDeviceCount(), but received id:0 >= GetGPUDeviceCount():0.] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:242)
Hi @neuralearn love the tutorial! Can you provide me access to the code?
Please after requesting access, check your mail inbox or spam
Thank You , chcecked but didn’t get nothing
Hi, Can you please allow me to access to the collab ? It will be very helpful..
Hi,
Please check your inbox or spam
How to get access to your notebook
Please check your mail :)
Can I have a colab?
Device ID must be less than GPU count, but received Id is:0 GPU count is :0, what does it mean when I run model.detect(image)
I am running this on my local machine
Hello my dear Ashish, try out this notebook: colab.research.google.com/drive/1vZHrahaaubhWMz83jlPuvA1na_v98fUP
@@neuralearn thanks for your response, I have sent you access request