Stop Using Selenium or Playwright for Web Scraping

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.พ. 2025
  • Check Out ProxyScrape here: proxyscrape.co...
    ➡ JOIN MY MAILING LIST
    johnwr.com
    ➡ COMMUNITY
    / discord
    / johnwatsonrooney
    ➡ PROXIES
    proxyscrape.co...
    ➡ HOSTING (Digital Ocean)
    m.do.co/c/c7c9...
    If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and web content as much as I do, you can subscribe for weekly content.
    ⚠ DISCLAIMER
    Some/all of the links above are affiliate links. By clicking on these links I receive a small commission should you chose to purchase any services or items.
    This video was sponsored by ProxyScrape.

ความคิดเห็น • 63

  • @hydrosis-v3k
    @hydrosis-v3k 3 หลายเดือนก่อน +10

    holy shit, literally the first proper guide to a decent alternative to web automation.

  • @jonathanfriz4410
    @jonathanfriz4410 2 หลายเดือนก่อน +2

    My man John now with sponsors. Happy for you. Long time no read!. Hope you the best. Awesome content like usual.

  • @andydataguy
    @andydataguy 3 หลายเดือนก่อน +5

    Your timing could not have been more perfect on this video brother. Thank you!! 🙌🏾💜
    Idk if you have a community or program, but as soon as i get hired im buying it to help support you.
    As a selftaught dev (formerly digital marketer) your videos were vital in being able to finally wrap my head around data extraction

  • @SigKappel
    @SigKappel 3 หลายเดือนก่อน +4

    Thanks John for this update. I have some mission critical scrapes for inventory and this will come in handy when I need to be more stealth.

  • @AHMED_Mostafa_xd
    @AHMED_Mostafa_xd หลายเดือนก่อน +1

    Thank you, I have been looking for an excellent addition for more than two weeks, but I did not find it until I watched your video ❤❤❤❤

  • @zvnman
    @zvnman 3 หลายเดือนก่อน +1

    Thanks for the kind advice! In gratitude to all my clients, I give your referral link)) Keep going bro!!!

  • @frankcasanova2132
    @frankcasanova2132 3 หลายเดือนก่อน +3

    JUST WHAT I NEEDED WHEN I NEEDED IT

  • @Analyse_US
    @Analyse_US 3 หลายเดือนก่อน +1

    This is gold! Thanks.

  • @return_1101
    @return_1101 3 หลายเดือนก่อน

    The best one! Thank you very much, mr. Rooney!

  • @EnglishRain
    @EnglishRain 3 หลายเดือนก่อน +1

    Thank you i didn't know Playwright isn't really recommended for scraping

  • @realitywords-17398
    @realitywords-17398 3 หลายเดือนก่อน

    Great Man.......... You are struggling a lot...... Keep your morale high!

  • @davidl3383
    @davidl3383 2 หลายเดือนก่อน

    excellente and easier than playwrigth etc !! thank you so much

  • @graczew
    @graczew 3 หลายเดือนก่อน +1

    Thanks, mate. This is really helpful.

  • @gamehubler
    @gamehubler 3 หลายเดือนก่อน

    I learn a lot from your video explanations and like your style of sharing it with us, and comes with the fresh air because it is up to date.
    Can you maybe bring some more complex web scraping examples? Mostly the pages in your showcase were viewable directly or with a login.
    Would it be possible to bring some examples in the future with GQL sites, it is more difficult to solve these types.

  • @EmanueleCannizzaro
    @EmanueleCannizzaro 3 หลายเดือนก่อน +1

    As usual a great content explained in plain English.

  • @kinuthiamatata6040
    @kinuthiamatata6040 3 หลายเดือนก่อน +9

    working with a website that uses WebSocket connections instead of traditional XHR/fetch requests. What's the best way to intercept traffic for scraping?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 หลายเดือนก่อน +11

      honestly I haven't got a lot of experience with websockets, but i know you can connect to them via requests/httpx but i dont have any practical exp sorry

    • @mehdikaraouet7980
      @mehdikaraouet7980 3 หลายเดือนก่อน

      ​@JohnWatsonRooney all the respect for honesty and humbleness ❤

    • @eyayawb
      @eyayawb 3 หลายเดือนก่อน +2

      Reverse engineer the data transfer process. I successfully reverse-engineered an app that pulls data from a Firebase database using a WebSocket connection. I used websockets package to replicate the communication pattern.

  • @hamzahalli3500
    @hamzahalli3500 3 หลายเดือนก่อน +1

    Thank you

  • @mvace
    @mvace 3 หลายเดือนก่อน +1

    Thanks John, I was struggling to login to one website for a few days now. It was detecting my automation script. I used selenium-driverless and it works great now. So perfect timing with this video! Do you have any tips on how to handle uploads with selenium-driverless? After I log in to the website I need to upload a file from my local machine but the upload seems to be working differently for selenium-driverless compared to selenium. There is not much in the documentation regarding uploads in selenium-driverless.

  • @ibrahimmudassar6563
    @ibrahimmudassar6563 2 หลายเดือนก่อน +2

    if I ran this on a VPS, hwo would the steps differ? I'm currently having trouble running playwright in a cli only VPS

  • @aus1046
    @aus1046 หลายเดือนก่อน

    Which do you recommend? Selenium driverless or nodriver?

  • @utsavgoswami5263
    @utsavgoswami5263 3 หลายเดือนก่อน

    thank you so much for this. if it is okay, are there any better alternatives (i.e. cheaper) to proxyscrape?

  • @abdelkoddouslaarif1295
    @abdelkoddouslaarif1295 3 หลายเดือนก่อน

    okay! this is all good and well, impressive even, but I wanna ask a slightly different question, how do you stay updated with the new alternatives that come out? you always seem to know when something better is out or if something old isn't as reliable anymore or not being maintained... what tech news forums/ community chats do you follow?

  • @Theo_Phage
    @Theo_Phage 8 วันที่ผ่านมา

    I like my scrapers like I like my cars, driverless and windowless

  • @dragon3602010
    @dragon3602010 3 หลายเดือนก่อน +3

    so when choosing "nodriver" instead of "SeleniumBase" ?
    thanks

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 หลายเดือนก่อน +1

      Try both and see which works best for what you are trying to do

  • @Steve-lu6ft
    @Steve-lu6ft หลายเดือนก่อน

    Will this method work when the information you want is in Network > Doc > Name (we select what we want here)... But it's only organized nicely as text in the Preview tab?

  • @AHMED_Mostafa_xd
    @AHMED_Mostafa_xd หลายเดือนก่อน

    Is it discovered that it is a robot, because there are sites that block it

  • @yafethtb
    @yafethtb 3 หลายเดือนก่อน

    At last, a web scraping library that does not need another Chromium and uses only my Chrome! I'm waiting for something like nodriver.

  • @hw5622
    @hw5622 3 หลายเดือนก่อน

    great video! thx!!!!!!!!

  • @duyduy5595
    @duyduy5595 หลายเดือนก่อน +1

    Why not just use applescript ???

  • @nickwoodward819
    @nickwoodward819 3 หลายเดือนก่อน +2

    i'm a bit confused by "using the chrome already installed on your machine" - do you mean the server? or is this a python thing? (I'm using js and would be scraping either in a serverless function or in node)

    • @kinuthiamatata6040
      @kinuthiamatata6040 3 หลายเดือนก่อน

      i agree this might not be easily "automateable" , "You may consider Playwright's connectOverCDP() with a containerized Chrome. Just make sure to expose Chrome's debug port via socat (socat TCP-LISTEN:9222,fork TCP:localhost:9222) and connect to that. This works great with Node/serverless and gives you full browser auto-capabilities."

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 หลายเดือนก่อน +2

      Sorry to clarify - when you install selenium or playwright,m they install their own version of chrome. these two use the existing install on your pc already (if you have it) so you install chrome as you would normally, no extra install steps etc

  • @Optimusjf
    @Optimusjf 2 หลายเดือนก่อน

    How can I send a PFX client certificate in Selenium-driverless?

  • @rewazilol
    @rewazilol 3 หลายเดือนก่อน +1

    Do you have experience using Playwright and there being issues? On one hand I know its a test lib, but on the other hand its extremely well funded and well maintained. The nodriver lib looks like it was built by one guy. I'm still kind of not sure which way to go

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 หลายเดือนก่อน +3

      Playwright it an amazing library, it’s easy to use and works extremely well. But for web scraping you have to monkey around with it as even the most basic WAF can detect it and throw a captcha. Nodriver is build by one guy but it’s open source and removes a lot of the obvious flags that the browser is being controller automatically which is why I’m talking about it here

  • @saadkhan883
    @saadkhan883 3 หลายเดือนก่อน

    Sir I am using playwright library to scrap Google maps so please tell is it a good choice to scrap data from GMpaps , because I am still thinking about blocking or recaptcha ...

  • @THOTHO-ie5lz
    @THOTHO-ie5lz หลายเดือนก่อน

    will proxyscrape not able to help to bypass detection on playwright script? (headless mode)

  • @serkhetreo2489
    @serkhetreo2489 3 หลายเดือนก่อน

    Hi, what if i want to build an unofficial api for a site . Is there a better way

  • @phantazzor
    @phantazzor 3 หลายเดือนก่อน

    how to scrape an app if there are no web version

  • @wisjnujudho3152
    @wisjnujudho3152 3 หลายเดือนก่อน

    wow. i think i need to modify all of my scrapers.

  • @SlimeTurner-n9y
    @SlimeTurner-n9y หลายเดือนก่อน +1

    Bro im having trouble accesing s with no driver someone help me lol ill pay

  • @green-forest-23
    @green-forest-23 3 หลายเดือนก่อน +1

    There's something like this for typescript?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 หลายเดือนก่อน +1

      Yes but I forget what it’s called - hopefully
      Someone else will know

    • @boniazdaniel
      @boniazdaniel 3 หลายเดือนก่อน

      Puppeteer extra plugin? The stealth one

  • @cstanleyhns
    @cstanleyhns 2 หลายเดือนก่อน

    Hi, is it possible to run this kind of thing in docker and then ultimately in a lambda?

    • @ibrahimmudassar6563
      @ibrahimmudassar6563 2 หลายเดือนก่อน

      I have the same question

    • @cstanleyhns
      @cstanleyhns 2 หลายเดือนก่อน

      @ibrahimmudassar6563 i ended up using playwright with python running in a docker container and deployed to ECS within AWS. Works well

  • @amrogendiah198
    @amrogendiah198 3 หลายเดือนก่อน

    Could you make a video about scraping using AI with python ?

  • @MohanadSaid-u8x
    @MohanadSaid-u8x 3 หลายเดือนก่อน +1

    Liiiiiit 🔥🔥🔥🔥

  • @DavidChavez-z2v
    @DavidChavez-z2v 3 หลายเดือนก่อน

    Como sabes estas cosas excelente

  • @luisechevarria186
    @luisechevarria186 3 หลายเดือนก่อน +1

    Where does one begin for web-scraping? I am trying to scrape, JS heavy websites, all I see are recommendations to use Selenium/Playwright, this video is not too clear about what to use instead.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 หลายเดือนก่อน +2

      Try using selenium-driverless. Similar to selenium but less detectable causing you less issues scraping

  • @Canda-fh4xc
    @Canda-fh4xc 3 หลายเดือนก่อน

    Thank you