How to perform text analytics in R on Multiple PDF Documents

Data Centric Inc.

มุมมอง 14 896

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 7 พ.ย. 2024

ความคิดเห็น • 73

@ehecatl3830 2 ปีที่แล้ว ⁺¹
Your english is very clear Thanks
@prometeo34 2 ปีที่แล้ว ⁺²
Madame, you are one of the best teachers I have seen...well done! Thanks so much for these videos.
@DataCentricInc 2 ปีที่แล้ว
You are welcome Carlos
@christopherkhaddockphd9511 ปีที่แล้ว
This video is excellent. You have a tremendous talent for teaching!
@DataCentricInc ปีที่แล้ว
Thank you 😊
@alancelaya3123 11 หลายเดือนก่อน
THANKS FOR THE TUTORIALS... I HAVE A QUESTION: I need to apply OCR on pdfs before starting to analyze them do you have a tutorial about this issue?
@SanjayFuloria ปีที่แล้ว
Thank you very much. I have a problem. When I run the Corpus function to create the pdfdatabase, I get the following error: PDF error: Unknown Metadata type: 'XMP'. Could you please help me with that?
@christopherbrown576 ปีที่แล้ว
How do you search for individual specific terms, rather than frequently used terms? Thanks!
@rebeccadsolson1207 2 ปีที่แล้ว ⁺³
You are a great teacher! Such clear explainations. Thank you so much!
@DataCentricInc 2 ปีที่แล้ว
You're very welcome Rebecca! Glad it was helpful.
@universoflearningacademy9503 6 หลายเดือนก่อน
i tried lots of time by creating different project but always object database not found. what I will do when I run this pdfdatabase
@kemalgunay 3 ปีที่แล้ว ⁺²
Very helpful content, thanks for sharing
@DataCentricInc 3 ปีที่แล้ว ⁺¹
Thank you Kemal😊
@carlitofernandes5491 3 ปีที่แล้ว ⁺¹
Fantástico, thanks, i search skill work most pdfs, obrigado, from Brazil
@DataCentricInc 3 ปีที่แล้ว
Thank you Carlito
@vincentdepaulsavarimuthu779 2 ปีที่แล้ว ⁺¹
really you are great madam.
@DataCentricInc 2 ปีที่แล้ว
Thank you
@igwegbehenrychinaza7908 2 ปีที่แล้ว
Thank you ma'am
Kindly share the link to download the PDF so that I can repeat what you did at home.
Thanks in anticipation.
@jahanzebtube 2 ปีที่แล้ว ⁺¹
Great explanation by explaining concepts in an easy way. You do it with simple ease. Thank you.
I was running the same codes and I came across a problem. I was wondering if you could put some light on it. Basically, when I run the Corpus function is gives this error:
Error in file(con, "r") : invalid 'description' argument
Can you please help?
@zachabenz 2 ปีที่แล้ว ⁺¹
Thanks you for your interesting video. I just ask plz where to get the "tm" pkg
@DataCentricInc 2 ปีที่แล้ว
Hi Zacha B, you can type the following line to install the tm library: install.packages("tm")
@zachabenz 2 ปีที่แล้ว ⁺¹
@@DataCentricInc Thank you very much. 👍🙏
@andreubrito11 2 ปีที่แล้ว ⁺¹
Very good tutorial!!
@DataCentricInc 2 ปีที่แล้ว
Thanks 😊
@shehurufai9273 2 ปีที่แล้ว ⁺¹
I look forward to working with you for my PhD thesis. Hope you will respond soon.
@DataCentricInc 2 ปีที่แล้ว
Hi Shehu, how may I be of assistance?
@vengateshprasathramamurthy2801 2 ปีที่แล้ว ⁺¹
Great Video! Thank you!
@DataCentricInc 2 ปีที่แล้ว ⁺¹
You are welcome Vengatesh
@agatabreczko6388 2 ปีที่แล้ว
Hello! When I am writing the code, in the line 9: "pdfdatabase
@DataCentricInc 2 ปีที่แล้ว
Ensure you run the line to require pdftools
@brianisinga918 ปีที่แล้ว
This is fantastic. Thank you. Could you kindly consider making a video on how to remove the fist say 5 lines from several pdf files and merging them. Or rather combining data from different pdf files after the 5th line/row.
@pcsksa5 2 ปีที่แล้ว ⁺¹
That's brilliant. Thank you for sharing.
@lowperformer_berlin 2 ปีที่แล้ว
hey, really cool video! thank you very much! I have one question for the results of line 21. (Frequency analysis) So if we do not count the words with that function, what are the numbers in the [...] brackets tell me?
@dawitzewde6654 ปีที่แล้ว
You're fantastic, as always. Thanks so much for your help.
@justdrawing9207 2 ปีที่แล้ว ⁺¹
Hello, thank you for your videos, they help us so much! Please how many papers we can analyze? We can analyze more than 3 PAPERS ??
@DataCentricInc 2 ปีที่แล้ว ⁺¹
You are welcome JustDrawing. You can analyze more than three. I have done up to 30 and you could probably do more.
@justdrawing9207 2 ปีที่แล้ว ⁺¹
@@DataCentricInc Thank you so much professor 🙏🏻🙏🏻🙏🏻
@agustincsn 2 ปีที่แล้ว
I tried and followed the scripts given but when I load command opinion
@affanasif7506 2 ปีที่แล้ว
how to know the frequency of some particular words. for example I want to know the frequency of certain words like "Technology, blockchain, peer to peer transaction, new systems etc
'
@MsBambi01 2 ปีที่แล้ว ⁺¹
Thank you for a great video! It has helped me so much :)
@DataCentricInc 2 ปีที่แล้ว
You are welcome
@kats_pajamas6908 2 ปีที่แล้ว ⁺¹
thank you so much! Amazing video!
@DataCentricInc 2 ปีที่แล้ว
You are welcome @ kats_pajamas
@17Adamovic 2 ปีที่แล้ว ⁺¹
thank you for the great work/video! One question, what would be the line to run to search for a specific set of words?
@DataCentricInc 2 ปีที่แล้ว ⁺¹
Thanks 17Adamovic. If you watch parts 2 & 3 of text analytics on PDF, you will see additional ways to analyze the content on page level, document level and filter by words. Kindly see the following titles: How to perform Text Analytics on PDF Documents in R? Multiple PDF Analysis in R
@17Adamovic 2 ปีที่แล้ว ⁺¹
@@DataCentricInc as im brand new to learning R, and need it to do some research work for a professor, I've been watching and learning from your videos! I did watch the other parts, but I don't believe the search/count of a specific word was shown, unless I missed it. You show us how to filter or search for the most frequent words, but I was wondering if we could simply count the amount of a specific word, like "cyber"
@DataCentricInc 2 ปีที่แล้ว ⁺³
@@17Adamovic Kindly see code below that you can use to filter the frequency of words in the Term Document Matrix. Hope this helps :).
inspect(opinions.tdm[c("cyber"),])#search for specific words
@17Adamovic 2 ปีที่แล้ว ⁺¹
@@DataCentricInc Ahh!! You are the best... thank you!
@17Adamovic 2 ปีที่แล้ว
@@DataCentricInc do you have a video on the cleaning code that needs to be done to avoid missing out on the search words with " ' " in them (like, cyber's)?
When i apply the cleaning code in your current videos such as removePunctuation, stopwords, tolower, stemming, removeNumbers, bounds, and then search for a specific word, it still avoids the words with any apostrophes in them, even if i change the search term to say "cybers" since the previous coding might remove the apostrophe
@jekieraya4000 ปีที่แล้ว
Hey maam. can i connect your code in a php file?
@jekieraya4000 ปีที่แล้ว
maam
@harmandeepsingh8903 2 ปีที่แล้ว
Great Work Mam, i have one thing if you help me out on that, for example, we took only a specific term from the pdf and then want to analyze for that specific term. Is it possible
@DataCentricInc 2 ปีที่แล้ว ⁺¹
Yes it is possible to focus on a term.
@harmandeepsingh8903 2 ปีที่แล้ว
Thank you for response, please do a video on that for your subscriber
@saqibwarriach ปีที่แล้ว
The missing thing in Data Centric Inc series of tutorial is annotaion of Function words and content words as pre-processing steps, it will be highly pleasing to get your insights and hands-on annotaion and removal of function words prior to analysis.
@itumelengmosala5335 2 ปีที่แล้ว
Am continuing to struggle with the code below. Giving me error
list.files(path = folder , pattern = "pdf$")
folder
@DataCentricInc 2 ปีที่แล้ว
Hi itumeleng, I have asked you to send me an email. Check the previous replies I have sent.
@er2759 2 ปีที่แล้ว
Hello thanky for the great videos!! I have some issues with line 4 its not working. I sent you an mail hopefully you can help me.
The error is: Error in lapply(files, pdf_text) : object 'files' not found
@DataCentricInc 2 ปีที่แล้ว
Hi ER, it is difficult to diagnose the problem just from this error, ensure you run the line that create the files variable just to so be sage because that could cause an error as well.
@kripa_dristi 2 ปีที่แล้ว ⁺¹
Can you please make a video on text mining in search of pdf online by using one keyword
@DataCentricInc 2 ปีที่แล้ว
Thanks for your feedback Kripa however I need a little more clarity on this request. Is it that you want to search for a PDF file on the web using R and then perform text mining on the results?
@kripa_dristi ปีที่แล้ว
@@DataCentricInc can you directly implement text mining to search & download any pdf available in web or from any publisher
@josephjohns1336 2 ปีที่แล้ว ⁺¹
Could you please make a video about how to scrape, clean, and visualize data from within tables in a pdf using R? Preferably not a video that uses the tabulizer library or family of libraries. Only pdf tools please.
@DataCentricInc 2 ปีที่แล้ว
Hi Joseph, I will take a look at this and let you know.
@DataCentricInc 2 ปีที่แล้ว
Hi John, you can look out for this video next Monday.
@itumelengmosala5335 2 ปีที่แล้ว
Apologies. I copied the code line incorrectly : Its refusing to accept apply function
files
@DataCentricInc 2 ปีที่แล้ว
Unfortunately if you do not send the email as per request I will not be able to assist you.
@Khomo.96 ปีที่แล้ว
@itumeleng mosala did you succeed in applying the function?

ต่อไป

เล่นอัตโนมัติ

Perform text analytics on web page data in R| Natural Language Processing in R