Awesome! I was using puppeteer to scrape a site and converted it to pinging their api directly. So much faster and no random errors when a element fails to load. Where would you host your scraping scripts that run everyday, hour or minute? I used a package to run it as a service on windows.
love this podcast and this episode since i’m also an scrape OG/ automation panda :) side question will the video format of the podcast ever pan into visual snapshots; when talking about something like when mention console then pan into a snapshot of that or if a website is mentioned than a print screen of that like wes did once during the this video; i know this will add in more work during editing but it would be extra coolness if it was included as a standard; thanks keep up the awesomeness 🎉👍;
Yeah, I jumped off the audio version and onto TH-cam hoping to see something in action. But I think that would slow down the time to upload, CJ probs has something in the mix no doubt.
I never thought I’d hear XPath mention on a podcast. It’s really too bad XML became a 4 letter word. There was actually some cool things you could do with it that you can’t do with JSON. It also having a DOM for one thing.
How would you alert if something was available? I want instant, attention ambushing feedback if my scraper finds something. If i run a cypress script in headless to check a site for tickets, say, and it found one, i want a desktop alert somehow. Browser alerts work if i run it manually, but if I schedule it on mac, then it runs in the background and i dont get any alerts.
finally a talk on Web Scraping! good to see you again wesbos and scott!
Awesome! On the same line, I’d love an episode on reverse engineering scrambled or minified webapps 😏
good idea - I think there is also one on how to find objects of data in the JS heap
Love you both from Sri Lanka...🇱🇰 ❤
Awesome! I was using puppeteer to scrape a site and converted it to pinging their api directly. So much faster and no random errors when a element fails to load. Where would you host your scraping scripts that run everyday, hour or minute? I used a package to run it as a service on windows.
love this podcast and this episode since i’m also an scrape OG/ automation panda :) side question will the video format of the podcast ever pan into visual snapshots; when talking about something like when mention console then pan into a snapshot of that or if a website is mentioned than a print screen of that like wes did once during the this video; i know this will add in more work during editing but it would be extra coolness if it was included as a standard; thanks keep up the awesomeness 🎉👍;
Yeah, I jumped off the audio version and onto TH-cam hoping to see something in action. But I think that would slow down the time to upload, CJ probs has something in the mix no doubt.
I never thought I’d hear XPath mention on a podcast. It’s really too bad XML became a 4 letter word. There was actually some cool things you could do with it that you can’t do with JSON. It also having a DOM for one thing.
Working on a scraper rn.
Public repo? Link us up
Oh it's you Scott, hahah. I had a rush of enthusiasm to work on it with a fellow listener but now I feel silly.
Have you or anyone else extracted data from an interactive chart?
Lol I've been watching every episode since CJ joined and yet I'm not subscribed 😅
Time to change that
yeahhh buddy
Is there a course you recommend for this?
How would you alert if something was available? I want instant, attention ambushing feedback if my scraper finds something.
If i run a cypress script in headless to check a site for tickets, say, and it found one, i want a desktop alert somehow. Browser alerts work if i run it manually, but if I schedule it on mac, then it runs in the background and i dont get any alerts.
I'm kind of late to this question, but telegram bots are where useful for this!
If someone scrapes for indexing and links to your site to consume it I am totally cool with it, but if someone scrapes to bypass the site I'm not.