Scraping with Playwright 101 - Easy Mode

แชร์
ฝัง
  • เผยแพร่เมื่อ 10 ก.ย. 2024

ความคิดเห็น • 32

  • @bigoper
    @bigoper 2 หลายเดือนก่อน +1

    This is awesome!!
    As an API Security Specialist, I always start by looking at the HTTP calls, searching for an API call that might have that same info. Saving me time from scraping the page. Most of the time I’m having success with that approach, especially when dealing with solid companies/websites/platforms.

  • @alexanderkomanov4151
    @alexanderkomanov4151 5 หลายเดือนก่อน +2

    Great one!
    I think that using pytest-playwright package can save several lines of code in the initialization part, because you can just use the page:Page fixture

  • @NomadicDmitry
    @NomadicDmitry 23 วันที่ผ่านมา +1

    Really great tutorial! Thanks, John!

  • @robertramirez2167
    @robertramirez2167 5 หลายเดือนก่อน +3

    I like that image blocking tip!

  • @bgriffin5447
    @bgriffin5447 หลายเดือนก่อน +1

    That split move was nice

  • @Extrey
    @Extrey 5 หลายเดือนก่อน +1

    Nooooo waaaay, i just found schema on another websites, nice trick anyway, but i find it more efficient to read the info from the category pages. Thanks for your videos, they always inspire me!!!

  • @graczew
    @graczew 5 หลายเดือนก่อน +1

    Good content as always. Enjoy your Easter break 😉👍

  • @user-wu4ip7mp3z
    @user-wu4ip7mp3z 4 หลายเดือนก่อน +1

    I'm following this exact code in VSCode and only the initial web is opened, it doesn't open the subsequent pages that direct to each of the product, no idea how to fix this...

    • @user-wu4ip7mp3z
      @user-wu4ip7mp3z 4 หลายเดือนก่อน

      nvm, fixed it, turns out the data-selenium=...GridView... has been changed to [data-selenium='miniProductPageProductNameLink']

  • @donaldandmijung
    @donaldandmijung 17 วันที่ผ่านมา

    really well explained! is there a way to run the loop in the original browser? say if were only interested in the first page of the pagination and the products on only page 1.

  • @IshaqKhan010
    @IshaqKhan010 5 หลายเดือนก่อน +1

    sir can you make a video how to deploy playwright script on google cloud function / vpc please

  • @fredde7356
    @fredde7356 5 หลายเดือนก่อน

    Hey John, can you please continue the scraping livestream with your test site? 😃
    Would love to see how to handle the drop-down menus, Java script and how to handle stricter cloudflare rules
    Would be happy to hear about some news! Enjoy easter :)

    • @munchcup
      @munchcup 5 หลายเดือนก่อน

      On cloudflare One idea is usually using undetected chrome driver to avoid cloudflare and you can put delay while logging in to solve the captchas the first time and save the cookies. After that you no longer need to solve captchas it will be automatic.

  • @carloiurcovici
    @carloiurcovici 5 หลายเดือนก่อน +1

    Thank you John, I've been really enjoying your videos recently and applying everything at work where it comes in really handy. Would you consider creating a python/scraping course on Udemy or a similar platform?

    • @JohnWatsonRooney
      @JohnWatsonRooney  5 หลายเดือนก่อน

      thanks for watching. I have thought about creating a course but no serious plans yet i;m afraid

    • @carloiurcovici
      @carloiurcovici 5 หลายเดือนก่อน

      @@JohnWatsonRooney thanks for the reply, if you change your mind you got my money 😂

  • @elu1
    @elu1 5 หลายเดือนก่อน

    Thank you John for the teaching. I seem to have issue with Xvfb for running 'headless'. Any suggestion or resources that I can learn from?

  • @badrenanna3961
    @badrenanna3961 5 หลายเดือนก่อน +3

    can you please start talking about some difficult cases :
    - scraping a website that has cloudflare protection against bots (even using proxy rotation it didn't work)
    - scraping website that have captchas protection
    ..
    Thank you

    • @munchcup
      @munchcup 5 หลายเดือนก่อน +2

      One idea is usually using undetected chrome driver to avoid cloudflare and you can put delay while logging in to solve the captchas the first time and save the cookies. After that you no longer need to solve captchas it will be automatic.

  • @mohsinhassan88
    @mohsinhassan88 5 หลายเดือนก่อน +3

    Omg why the white editor??

    • @РНТ
      @РНТ 5 หลายเดือนก่อน

      Exactly. When I saw it I immediately remembered this video: th-cam.com/video/XlgqZeeoOtI/w-d-xo.html 😂

    • @tendosingh5682
      @tendosingh5682 5 หลายเดือนก่อน +1

      For some its easier on the eyes. MY eyes cant stand the dark themes.

    • @mohsinhassan88
      @mohsinhassan88 5 หลายเดือนก่อน

      @@РНТ exactly how I felt. And specially since John usually has amazing videos and everything is so perfectly balanced in terms of theme and ease on eyes.
      I was a super shock

  • @alexdin1565
    @alexdin1565 5 หลายเดือนก่อน

    Thanks john, but now days most websites don't allow you to open links like you do they will block you after 3 or 4 pages open in same time
    another question If you can make a video on how we can use playwright inside a docker with proxy to make many requests at same time it will be very nice
    sorry for my English, I'm not a native speaker

  • @s6yx
    @s6yx 5 หลายเดือนก่อน

    Can’t you just do viewpoint for setting a screen size and header and run it headless with no issue

  • @danueecitizen
    @danueecitizen 5 หลายเดือนก่อน

    can this work with amazon ? 🤔

  • @archiee1337
    @archiee1337 2 หลายเดือนก่อน

    why not headless?

  • @pkavenger9990
    @pkavenger9990 หลายเดือนก่อน +1

    Your content is good but i think you should engage with your audience more instead of speaking like you are talking to yourself. You will see that you will get much more views. Take Gotham chess channel for example he is not a Grandmaster of chess but His channels have more views and subscriber than Hikaru and Magnus because of his communication skills.

    • @JohnWatsonRooney
      @JohnWatsonRooney  หลายเดือนก่อน

      Fair point thanks for the advice