Thanks a lot! You helped me SOOO MUCH! I was looking for another package than tabulizer (now out of CRAN :/ ) and you showed me more than what I was searching: your function is AMAZING. Thank you, from a brazilian data worker!
Thank you! Very useful, I normally use Pypdf2 Py for complex table extractions but pdftools R is easier to troubleshoot A question, I have cases in which in tables you have blanks rather that zeros and row values after are offset by one. Any easy solution for this?
Thanks, but it gives me the following error. Error in `map()`: ℹ In index: 1. Caused by error in `(table_start):(table_end)`: ! argument of length 0 Run `rlang::last_trace()` to see where the error occurred. Warning message: In min(which(table_end > table_start)) : no non-missing arguments to min; returning Inf
Hi TrueGrit, I am not sure what went wrong in your code as I would have to see the code to identify the issue. However I am copying my code below that you can copy and paste and use. If you are still getting the same error it is possible that the start and end of the table is not unique enough in the document so the data is not being picked up. require(pdftools) require(tidyverse) require(ggplot2) # download pdf and load file url
@@DataCentricInc Thank you great for your quick reply and good suggestion! I will try your codes and come back here. I made it! Thank you so much! Can I ask you furthermore about the table? If I wish to make a table just following the same kind of table in the last result, how can I make the table to be visible in R? Could you give me some codes about that? I will drop here more often from now on. I subscribed your channel.
@@truegrit5411 Thanks for subscribing. The table is in TestDF as a data frame. If you highlight TestDF only and run you will see the table in the console.
You can also look in the global environment to the top right hand corner and you will see TestDF. If you click on it the table will come up as a separate tab in R.
@@DataCentricInc thank you very much again. Yes, I checked the data table appeared in environment and opened it. My wish is to draw a real table in my R output or R markdown. Could you give some idea? I guess kable(?) may make it.
Hi Daniel, the code in this package is very specific the start and end of the table you are loading has to be unique to load the data. If you want to load multiple tables you would need to replicate the code.
Nice stuff but the function is highly specialized and will only work in particular situation. Why not simply extract the page witht he table and then work on it. Also I have a situation where pdf_text cannot see my table. Howerver, pdf_ocr_text( with dpi at 1000) will capture it.
Thanks a lot! You helped me SOOO MUCH! I was looking for another package than tabulizer (now out of CRAN :/ ) and you showed me more than what I was searching: your function is AMAZING.
Thank you, from a brazilian data worker!
You are welcome, glad I could help
great teaching
Congrats on 180 subscribers you’re doing so well ❤️
Thank you 😊
Nice! Thank you.
THANKS!
The best tutorial, amazing :)
Nuff respect Dr. Cross
Thanks Lenworth, big up yourself!
Thank you! Very useful, I normally use Pypdf2 Py for complex table extractions but pdftools R is easier to troubleshoot
A question, I have cases in which in tables you have blanks rather that zeros and row values after are offset by one. Any easy solution for this?
hello, what is pdf_text ? i get this error: Error in as_mapper(.f, ...) : object 'pdf_text' not found
Thanks, but it gives me the following error.
Error in `map()`:
ℹ In index: 1.
Caused by error in `(table_start):(table_end)`:
! argument of length 0
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
In min(which(table_end > table_start)) :
no non-missing arguments to min; returning Inf
Thank you very much for your work! I tried but at the last got this errors. >
--
results
Hi TrueGrit, I am not sure what went wrong in your code as I would have to see the code to identify the issue. However I am copying my code below that you can copy and paste and use. If you are still getting the same error it is possible that the start and end of the table is not unique enough in the document so the data is not being picked up.
require(pdftools)
require(tidyverse)
require(ggplot2)
# download pdf and load file
url
@@DataCentricInc Thank you great for your quick reply and good suggestion! I will try your codes and come back here. I made it! Thank you so much! Can I ask you furthermore about the table? If I wish to make a table just following the same kind of table in the last result, how can I make the table to be visible in R? Could you give me some codes about that? I will drop here more often from now on. I subscribed your channel.
@@truegrit5411 Thanks for subscribing. The table is in TestDF as a data frame. If you highlight TestDF only and run you will see the table in the console.
You can also look in the global environment to the top right hand corner and you will see TestDF. If you click on it the table will come up as a separate tab in R.
@@DataCentricInc thank you very much again. Yes, I checked the data table appeared in environment and opened it. My wish is to draw a real table in my R output or R markdown. Could you give some idea? I guess kable(?) may make it.
Thank you!! But I'm trying to loop this on multiple pdf files, what if the table end varies from one pdf to another? Please help :)
Hi Daniel, the code in this package is very specific the start and end of the table you are loading has to be unique to load the data. If you want to load multiple tables you would need to replicate the code.
Please make a video on scraping a website specially explaining HTML and CSS.
Hi Campus Corridors
You can check out this video on my channel where I scarp data from a website. th-cam.com/video/onacC9OTYv8/w-d-xo.html
Nice stuff but the function is highly specialized and will only work in particular situation. Why not simply extract the page witht he table and then work on it. Also I have a situation where pdf_text cannot see my table. Howerver, pdf_ocr_text( with dpi at 1000) will capture it.