Supercharge Your Scraper With ASYNC (here's how)

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 พ.ย. 2024

ความคิดเห็น • 17

  • @saulo_foot
    @saulo_foot ปีที่แล้ว +3

    Excellent as always! I believe my web scraping performance has got much better after learning Javascript. Understanding async/await concept by learning JS promises was crucial for me.

  • @AwB
    @AwB ปีที่แล้ว

    John, I am using scrapingbee synchronously to scrape 1000 URLs and growing and it takes forever.
    Scrapingbee and other proxies allow for concurrent requests, while I also know you can do things Async. A video would be great on the difference and why you would do one or the other or how you would do both. Here are some questions:
    1. Is concurrent procesess just for requests or the parsing as well? Does this impact writing to a csv if you have multiple processes running at once?
    Appreciate your content. I feel like my scraper is almost there in terms of scalability and efficiency and I'm really excited.
    (Although I probably need to implement a dataclass at some point)

  •  ปีที่แล้ว +3

    What do you think of scraping google cache? Might speed it up too when you dont have the JS stuff to download

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +1

      that's not something I've tried actually interesting idea though

  • @JulienDeneuville
    @JulienDeneuville ปีที่แล้ว +1

    Hey John, thanks for this video. I see you recommend httpx over requests for async: what about the AsyncHTMLSession from requests-html?

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +1

      It went unmaintained for a while so I moved away from it. It’s got new maintainers now so hopeful it gets a few issues fixed and comes back

  • @FabioRBelotto
    @FabioRBelotto 11 หลายเดือนก่อน

    I would link a video showing async and threading when scraping using playwright!

  • @christiandeantana1149
    @christiandeantana1149 5 หลายเดือนก่อน

    can i use async too if the website has a limit rate? for example : 429 too much request

  • @adarshjamwal3448
    @adarshjamwal3448 ปีที่แล้ว

    great video

  • @return_1101
    @return_1101 7 หลายเดือนก่อน

    Awesome!

  • @djangodeveloper07
    @djangodeveloper07 11 หลายเดือนก่อน

    async code makes things messy. i love to keep class base code and hard to handle that way. for speedy things, i use threading which works fine. if you have any video with async in class structure . would love to check that.

  • @yacinehechmi6012
    @yacinehechmi6012 ปีที่แล้ว +1

    I ran into an issue with using aiohttp while requesting a bunch of urls at the same time, i don't know if its a problem on my behalf or the server is not happy with me. I've put a limit of how much tcp connections are made seem to solve the issue, anyways I'm beginning to consider httpx as an alternative.

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +1

      I have a video coming soon that will help. I like aiohttp- i tihnk its unlikely thats the issue, HTTPX is good because you have requests like API for easy use as well as the async capabilities when you want them

    • @yacinehechmi6012
      @yacinehechmi6012 ปีที่แล้ว

      Well lucky me, excited about the video. Aiohttp is working fine after the fix, maybe server limit.

  • @srikanthkoltur6911
    @srikanthkoltur6911 ปีที่แล้ว

    Is it legal to scrape data from foreign countries like making thousands of requests might crash their website 😅

    • @1337shadow
      @1337shadow ปีที่แล้ว

      Hhhhhhh

    • @AmodeusR
      @AmodeusR ปีที่แล้ว +2

      If it's a problem, they'll block it. If they don't block, then do as you want, there is no law about not collecting data massively.