TidyTuesday: Creating Animated Charts using gganimate

Webscraping in R

Affordable, efficient classroom setup uses Raspberry Pi and ThinLinc

[UNCUT] เก็บ "สจ.โต้ง" เอี่ยวมาเฟีย รู้ตัวคนบงการ ก่อนสั่งลั่นไก I คนดังนั่งเคลียร์ I 13 ธ.ค.67

เปิดคำทำนาย นอสตราดามุส - บาบา วานก้า ปี 2025 คนทั่วโลกต้องเจออะไรบ้าง? | แฉ 11 ธ.ค. 67 [2/3] |GMM25

ปานเทพเปิดหลักฐานใหม่คดีแตงโม เจ็ทสกี 4 ลำพาไปไหน? : News Hour 10-12-67

TidyTuesday: Web Scraping Data using Rvest

Andrew Couch

มุมมอง 4 322

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 14 ธ.ค. 2024

ความคิดเห็น • 13

@jannonflores1113 2 ปีที่แล้ว
Thanks so much for this Andrew!!! Cheers!!
@afiqyahya3398 4 ปีที่แล้ว
Damn. I love how you choose your tidy tuesday contents. Cant complain enough.
@Pvillanueva13 4 ปีที่แล้ว ⁺²
Thanks for the intro to Rvest! The code as shown doesn't quite work correctly, though, since the get_text and get_link functions assign the same hardcoded link right at the beginning. I was able to get it to work just by deleting those lines - I got 6603 unique "staff members" this way compared to the 33 from this code. Thanks again for the video!
@AndrewCouch 4 ปีที่แล้ว ⁺¹
Good catch I’ll make sure to change it!
-Andrew
@felixzhao9070 4 ปีที่แล้ว ⁺¹
Hi Andrew, thank you so much for sharing the amazing content! I have a question with regard to identifying the total pages. In your tutorial, you went through a manual process, I wonder if there is any means to have R identify the total pages available? Because as the number of articles grows, you will have more pages than current available. Thanks again!
@AndrewCouch 4 ปีที่แล้ว
I think it depends on the webpage that you are scraping. For example, using page=all can sometimes retrieve all of the links into one url. Another way would be entering a large number and iterating through pages with the safely function. The pages that have no content would result in an error but the mapped function would still iterate through it.
tibble(page_num = 1:100) %>%
mutate(page = paste0("fivethirtyeight.com/tag/slack-chat/page/", page_num, "/")) %>%
mutate(links = map(page, safely(get_links))) %>%
mutate(links = pluck(links, 1))
If you are planning on scraping data that will be added to the website with another link, I recommend saving the links that have been scraped and using an anti-join on the entire links set when re-running the script. I know this isn't the most efficient way of web scraping but I hope this helps!
-Andrew
@felixzhao9070 4 ปีที่แล้ว ⁺¹
@@AndrewCouch Thank you so much for your quick reply Andrew! I will check it out...
@mohamedtekouk8215 ปีที่แล้ว
It is work with this example but with other examples output shows xmlnodset(0)
@LK-zt9vf 3 ปีที่แล้ว
how do I export this to CSV?
write.csv(data_slack_pages, "data_test.csv")
doesn't work
@AndrewCouch 3 ปีที่แล้ว ⁺²
Is anything in data_slack_pages nested? You may need to unnest a column.
Example:
data_slack_pages %>%
unnest(nested_column) %>%
write.csv("data_test.csv")
@LK-zt9vf 3 ปีที่แล้ว ⁺¹
@@AndrewCouch sorry for the slow reply. Worked a treat thank you, great tutorial. Might help to slow down for newbies just a bit!
@haraldurkarlsson1147 2 ปีที่แล้ว
Don't you have to check whether they allow scraping first? There may be no need if there is an API.
@AndrewCouch 2 ปีที่แล้ว
Yes in general you should look for a robots.txt file in the website or an API. I advocate for scraping what you need for personal projects but for professional/work projects I do not scrape and instead purchase data from vendors.

ต่อไป

เล่นอัตโนมัติ

TidyTuesday: Creating Animated Charts using gganimate

TidyTuesday: Creating Animated Charts using gganimate

Webscraping in R

Webscraping in R

Affordable, efficient classroom setup uses Raspberry Pi and ThinLinc

Affordable, efficient classroom setup uses Raspberry Pi and ThinLinc

[UNCUT] เก็บ "สจ.โต้ง" เอี่ยวมาเฟีย รู้ตัวคนบงการ ก่อนสั่งลั่นไก I คนดังนั่งเคลียร์ I 13 ธ.ค.67

[UNCUT] เก็บ "สจ.โต้ง" เอี่ยวมาเฟีย รู้ตัวคนบงการ ก่อนสั่งลั่นไก I คนดังนั่งเคลียร์ I 13 ธ.ค.67

เปิดคำทำนาย นอสตราดามุส - บาบา วานก้า ปี 2025 คนทั่วโลกต้องเจออะไรบ้าง? | แฉ 11 ธ.ค. 67 [2/3] |GMM25

เปิดคำทำนาย นอสตราดามุส - บาบา วานก้า ปี 2025 คนทั่วโลกต้องเจออะไรบ้าง? | แฉ 11 ธ.ค. 67 [2/3] |GMM25

ปานเทพเปิดหลักฐานใหม่คดีแตงโม เจ็ทสกี 4 ลำพาไปไหน? : News Hour 10-12-67

ปานเทพเปิดหลักฐานใหม่คดีแตงโม เจ็ทสกี 4 ลำพาไปไหน? : News Hour 10-12-67

โชคชะตากำหนดไว้แล้ว เอารถมาหลบในบ้าน ยังมีคำตามมาชน

โชคชะตากำหนดไว้แล้ว เอารถมาหลบในบ้าน ยังมีคำตามมาชน

Web scraping with rvest (R Case Study). Use RVEST to scrape and crawl websites then parse the HTML.

Web scraping with rvest (R Case Study). Use RVEST to scrape and crawl websites then parse the HTML.

15 НОВЫХ ЗАПРЕТОВ ГИБДД: ксенон, LED, катализатор, видеорегистратор, тонировка, дефлекторы, фаркоп

15 НОВЫХ ЗАПРЕТОВ ГИБДД: ксенон, LED, катализатор, видеорегистратор, тонировка, дефлекторы, фаркоп

Web Scrape Text from ANY Website - Web Scraping in R (Part 1)

Web Scrape Text from ANY Website - Web Scraping in R (Part 1)

TidyTuesday: How to Create Functions In R

TidyTuesday: How to Create Functions In R

R Masterclass | Web Scraping in R with the rvest Package

R Masterclass | Web Scraping in R with the rvest Package

TidyTuesday: Feature Elimination with TidyModels

TidyTuesday: Feature Elimination with TidyModels

TidyTuesday: Common GGplot2 Extensions

TidyTuesday: Common GGplot2 Extensions

TidyTuesday: Modern Forecasting with Prophet and TidyModels

TidyTuesday: Modern Forecasting with Prophet and TidyModels

Introduction to Selenium Using R (RSelenium)

Introduction to Selenium Using R (RSelenium)

มายคราฟ, แต่ หัวใจ คือ ขอบโลก!

มายคราฟ, แต่ หัวใจ คือ ขอบโลก!

#ด่วน พลิกคาด!ฝรั่งร้อง!คอมเมนต์ขู่เลิกดูถ้าเทคบอลยกเลิกตีลังกาฟาด,ไทยสร้างมาตราฐานใหม่เทคบอลโลก

#ด่วน พลิกคาด!ฝรั่งร้อง!คอมเมนต์ขู่เลิกดูถ้าเทคบอลยกเลิกตีลังกาฟาด,ไทยสร้างมาตราฐานใหม่เทคบอลโลก

#JasonDeruloTV // Funny #GotPermissionToPost From @SofiManassyan #SlowLow

#JasonDeruloTV // Funny #GotPermissionToPost From @SofiManassyan #SlowLow

อย่าเพิ่งเช็ดซอสถ้ายังไม่ดูคลิปนี้ เช็ดผิดมาทั้งชีวิต! #chengandrock #luckytree #เช้งกับร็อค

อย่าเพิ่งเช็ดซอสถ้ายังไม่ดูคลิปนี้ เช็ดผิดมาทั้งชีวิต! #chengandrock #luckytree #เช้งกับร็อค

Enceinte et en Bazard: Les Chroniques du Nettoyage ! 🚽✨

Enceinte et en Bazard: Les Chroniques du Nettoyage ! 🚽✨

Live!🔴 ทีมชาติไทย VS มาเลเซีย เชียร์สดฟุตบอลฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

Live!🔴 ทีมชาติไทย VS มาเลเซีย เชียร์สดฟุตบอลฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

ปอบพระอุ้มหมา ชีอุ้มแมว | หลอนไดอารี่ EP.258

ปอบพระอุ้มหมา ชีอุ้มแมว | หลอนไดอารี่ EP.258

Scum Rangers LIVE-018 นางฟ้าไรเฟิ้ล ดับเบิ้ลฟรุ้งฟริ้ง

Scum Rangers LIVE-018 นางฟ้าไรเฟิ้ล ดับเบิ้ลฟรุ้งฟริ้ง