How to Make 2500 HTTP Requests in 2 Seconds with Async & Await

แชร์
ฝัง
  • เผยแพร่เมื่อ 10 ก.ย. 2024
  • DISCORD (NEW): / discord
    This is a comparison about how to use Async and Asynio with AIOHttp and Python vs using threads and concurrent futures to best understand how we could make several thousand http requests in just a few seconds. Learning how to do this and understanding how it works will help you when it comes to running your own servers and web services, and stress testing any API environments you offer.
    github.com/jhn...
    text articles from: realpython.com/
    Support Me:
    Patreon: / johnwatsonrooney (NEW)
    Scraper API: www.scrapingbe...
    Proxies: proxyscrape.co...
    Amazon UK: amzn.to/2OYuMwo
    Hosting: Digital Ocean: m.do.co/c/c7c9...
    Gear Used: jhnwr.com/gear/ (NEW)
    -------------------------------------
    Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
    -------------------------------------

ความคิดเห็น • 112

  • @jmoz
    @jmoz ปีที่แล้ว +23

    You do not need to explicitly create task on line 18

    • @dpm_07
      @dpm_07 7 หลายเดือนก่อน

      why so ?

    • @PixelThorn
      @PixelThorn 5 หลายเดือนก่อน

      Gather already accepts coroutines, so any async function can be supplied directly

  • @victorhaynes508
    @victorhaynes508 10 หลายเดือนก่อน +5

    I can't believe how fast you got to the point. thank you for reading the room

  • @swordlion294
    @swordlion294 ปีที่แล้ว +19

    ThreadpoolExecutor is slower than Thread but saves memory. You don't need to use asyncio for waiting. You can use joins, barriers, wait groups and mutex locks with Thread to achieve the same. It comes down to preference, although asyncio is more streamlined, and thread is better suited for manual optimization that requires utmost speed and care.

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +5

      Thanks for the info. I still have a lot to learn about threads!

  • @bobong4248
    @bobong4248 2 ปีที่แล้ว +3

    Hi good sir thanks for the vid and most importantly, being actively engaging with people in the comment section Cheers

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      Thanks I’m glad you enjoyed the vid 👍

  • @ErikS-
    @ErikS- ปีที่แล้ว +73

    "how to build a ddos attacker..."

    • @dpm_07
      @dpm_07 7 หลายเดือนก่อน +2

      Right 😂😂

    • @amosrocha6793
      @amosrocha6793 2 หลายเดือนก่อน

      Kkkkk

  • @ihateorangecat
    @ihateorangecat 2 ปีที่แล้ว +9

    hey sir👋
    as a self taught dev You are one of my inspiration 🙌🙌🙌
    i started learning web scraping recently. i downloaded some videos about scrapy and BeautifulSoup tutorials of yours and
    i followed along them and found pretty comprehensive and clear i did learn better by your videos.
    i do hope more and more tutorial videos!!!
    thanks million times !🙏

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      Thank you that is very kind!

    • @RickeDz
      @RickeDz 10 หลายเดือนก่อน +1

      I am scraping a lot of things on web, this is just a pretty thing to do, I like this too much, it's so beautiful to see all the thing beeing "gathered" ahaha, are you still on this thing?

  • @fmanca100
    @fmanca100 4 หลายเดือนก่อน +1

    John, your channel is absolutely fantastic! congratulations!

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 หลายเดือนก่อน +1

      Thank you very kind

    • @fmanca100
      @fmanca100 4 หลายเดือนก่อน

      @@JohnWatsonRooney I don't say this lightly, best educational content I have seen in years. Straight to the point, rich and very well explained! (ok, and I now I stop :-)

  • @bakasenpaidesu
    @bakasenpaidesu ปีที่แล้ว +6

    u can speed up threading by using maxworkers = x
    with concurrent.futures.ThreadPoolExecutor(max_workers=10_000) as executor:
    executor.map(scraper_sub, links)
    max workers = 10_000 means 10k workers work at same time OR a single worker work 10k work at a time

  • @artemfagradyan3890
    @artemfagradyan3890 ปีที่แล้ว +1

    Thanks for your video!! Tried to send multiple async request for some api but my previous code was unefficient and your example helped me!!

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว

      Great stuff glad it was useful for you thanks!

  • @eduardocasanova-personal3064
    @eduardocasanova-personal3064 ปีที่แล้ว +1

    Thanks for the video, makes me appreciate go routines and their simplicity even more :)

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +1

      I couldn’t believe how easy it was when I started learning go

    • @eduardocasanova-personal3064
      @eduardocasanova-personal3064 ปีที่แล้ว +1

      @@JohnWatsonRooney preaching to the choir. Even with its rudimentary error handling

  • @vincentdigiusto9429
    @vincentdigiusto9429 2 ปีที่แล้ว +1

    very instructive thank you John, I think that scrapy uses async requests, that's why some scraping jobs can be impressive quick with scrapy

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      Yes scrapy uses twisted, which is asynchronous

  • @aprioriprogrammer9100
    @aprioriprogrammer9100 2 ปีที่แล้ว +1

    THIS IS FREAKING AWESOME, THANKS MAN, NOW I CAN DDOS FBI SERVERS WITH BILLIONS OF REQUESTS

  • @BringMe_Back
    @BringMe_Back 2 ปีที่แล้ว +1

    Awesome man , I was working with these things Today , I'll try this one too ♥️♥️♥️🙏

  • @aabmets
    @aabmets 2 ปีที่แล้ว

    Thanks, I'm gonna implement this in my "yfrake" package (PyPI).

  • @Christian-mn8dh
    @Christian-mn8dh 2 ปีที่แล้ว +2

    your videos are as efficient as your code

  • @fyazmanknojiya2298
    @fyazmanknojiya2298 4 วันที่ผ่านมา

    How you overcome out of socket port limit? We have only 65536 local port…

  • @androidmod183
    @androidmod183 2 ปีที่แล้ว +6

    Nicely done, a Proxy or socks5 "which i believe works with requests" will do the trick bypassing the traffic limit. But how can i implement it in this scenario? Thanks John.

    • @erenc8377
      @erenc8377 ปีที่แล้ว

      proxies=proxies :)

  • @hristijansaveski4231
    @hristijansaveski4231 2 ปีที่แล้ว

    Thank you good sir, you are a master at this!! You have helped me land and keep my data scraping job!! Thank you so much, truly an inspiration :))

  • @davidl3383
    @davidl3383 11 หลายเดือนก่อน +1

    Thank you John

  • @djangodeveloper07
    @djangodeveloper07 9 หลายเดือนก่อน

    for me, thread pool executer is best to go. using it from last few years and always gives best results. easy to handle in python standalone scripts or even in python websites with celery.

  • @terrascape
    @terrascape 2 ปีที่แล้ว

    hi john, or anyone who knows, I keep getting the error "UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 28855-28856: invalid continuation byte" , do you have a work around for this? the ones that I found on google I can't seem to implement correctly.. much thanks!

  • @maxjackson6616
    @maxjackson6616 2 ปีที่แล้ว +2

    On Yahoo finance, a company's income statement has table rows which need to be clicked on to expand and show the data. Do you know of a way to scrape such rows that doesn't involve using selenium to click on the button to expand the row?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      I haven’t looked for a while but I believe there was a Python package for working with yahoo finiance, yfinance I think? That made it very easy to get all the data

  • @mrjt6404
    @mrjt6404 2 ปีที่แล้ว +1

    hey thanks, good info.
    how to run 'subprocess' Asynchronously, while I dont want to use 'requests.content' to download
    subprocess.run(["yt-dlp", download_link, "-o", f"{output_dir}/{episode_name}.%(ext)s"], shell=True, stdout=PIPE)
    Also, What would you prefer Multi-Threading or Async in this case ?

  • @miguellopez7089
    @miguellopez7089 2 ปีที่แล้ว +1

    Awesome vid bro!

  • @saranpun7192
    @saranpun7192 8 หลายเดือนก่อน

    While using executors, if the capacity of tomcat to process is set to 200 only. What happens after 200 request? Will we have to wait till 200 request are processed by the server? Or, it will pick up some request when call passes from server to db.

  • @Feel_Sorry
    @Feel_Sorry 5 หลายเดือนก่อน

    How can we apply same thing in PHP? Currently I am using CURL Multi but it boost up my server utilization. So is there any alternative way to do same in PHP?

  • @acatisfinetoo3018
    @acatisfinetoo3018 7 หลายเดือนก่อน

    I was looking for a good explaination of the difference between async and multithreading...so threading is for doing things in parellel and async is for waiting for future tasks to complete but dosn't stall the current program?

  • @DeepakGupta-qv1yc
    @DeepakGupta-qv1yc 11 หลายเดือนก่อน

    Very well explained

  • @k98killer
    @k98killer หลายเดือนก่อน +1

    Threads do not actually compute simultaneously in CPython because of the GIL. The core dev team is experimenting with removing the GIL, but it will be years before it becomes a production-ready option. Iirc, micropython does not have a GIL, so it is able to actually execute threads simultaneously.

    • @JohnWatsonRooney
      @JohnWatsonRooney  หลายเดือนก่อน +1

      thanks for the clarification

    • @mianashhad9802
      @mianashhad9802 21 วันที่ผ่านมา +1

      How does using it with 4 workers speed up my script by almost 4 times then? Genuinely curious and trying to fill in gaps in my knowledge.

    • @k98killer
      @k98killer 20 วันที่ผ่านมา +1

      @@mianashhad9802 what does your script do? If it handles I/O with network interfaces or files, those operations can be run concurrently without violating the GIL afaik. It is also possible that code using only local variables with no nonlocal dict lookups could run in parallel, though I am not 100% sure of that.

    • @mianashhad9802
      @mianashhad9802 20 วันที่ผ่านมา +1

      @@k98killer Yeah it uses threads to send GET requests to the web server. I assume, in that case, it runs truly in parallel?

    • @k98killer
      @k98killer 20 วันที่ผ่านมา

      @@mianashhad9802 yes, any IO-bound operations can release the GIL and then reacquire it after the IO is finished. I bet you would get the same or possibly even better performance by using async instead. Also, I looked it up, and any pure Python code running in a thread has to acquire the GIL, but C libraries like Numpy will release the GIL; even local-only code requires the GIL.

  • @Rbm726
    @Rbm726 ปีที่แล้ว +1

    Thanks!

  • @abdullahsiddique7787
    @abdullahsiddique7787 ปีที่แล้ว +1

    In this example when we send 2500 async requests , does each request have the same session or each api call had its own session ?

    • @dpm_07
      @dpm_07 7 หลายเดือนก่อน +1

      no `aiohttp` maintains pool of connection !

  • @ahmedelsayed3133
    @ahmedelsayed3133 11 หลายเดือนก่อน +1

    This method puts a lot of pressure on the machine, and to my use case, I want to send more than 170 thousands requests, and when I use ThreadpoolExecutor, I can divide requests to lesser group using max workers' argument. Can that be done using Async & Await?

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 หลายเดือนก่อน +1

      It sounds like maybe Python isn’t going to be the right language, something like Go or rust might be better for you as they much better built in concurrency models

    • @ahmedelsayed3133
      @ahmedelsayed3133 11 หลายเดือนก่อน

      Which is easier to learn?@@JohnWatsonRooney

  • @ericxls93
    @ericxls93 2 ปีที่แล้ว +1

    Very good vid!! I just finished making use of concurrent futures (based on your previous vid) and it speed up my code considerably! Looks like I have the potential to speed up further 😀. Will making a lot of requests at the same time slow down the source server, thus passing the waiting time to the server?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      It should be fine, obviously you can overload it (ddos attacks work this way) but as we are just trying to maximise our efficiency we can stay within any rate limits or server limitations set. It’s much more useful for getting data from many places at the same time rather than just one

  • @eccentricOrange
    @eccentricOrange 2 ปีที่แล้ว +3

    Hey, wanted to point something out: I've often timed my own code too, and I find that print() consistently causes a large delay. So if you're analysing these delays, I would suggest not including a print statement there.
    Love your content though!! Helped me with a lot of stuff.

  • @82NeXus
    @82NeXus 9 หลายเดือนก่อน

    Why does the threaded version take so much longer than the async one, when it should be sending all the requests simultaneously as well? The synchronous one on the other hand waits for each response before it sends the next request. You could have a version that doesn't use any parallel programming facilities in Python but still works in parallel by having the async / parallel stuff handled by a Python package, or handled in an underlying C library. Eg. if you used the (much lower level) socket module and just repeatedly called send (), your requests would all go into the send buffer and the OS would send them while your Python program is waiting or doing something else.

    • @ADITYAKumar-xi1zt
      @ADITYAKumar-xi1zt 2 หลายเดือนก่อน

      Apparently there is something called GIL in python, you might wanna read about that.

  • @ZenoModiff
    @ZenoModiff 2 ปีที่แล้ว

    hello john can you make a video on scrapping world population data website please i tried but failed beacuse the span tag is constandly changing

  • @andreotako2020
    @andreotako2020 ปีที่แล้ว

    Hello
    I get this error
    Event loop is closed

  • @bunnihilator
    @bunnihilator 6 หลายเดือนก่อน

    is the normal request library a blocking library?

  • @karthikb.s.k.4486
    @karthikb.s.k.4486 2 ปีที่แล้ว +1

    Thank you for the video. What theme of VS Code are you using please let me know

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      Hey thanks, this is PyCharm but the theme is gruvbox which is available for vs code too

  • @igiveupfine
    @igiveupfine ปีที่แล้ว

    i wish this used a real networked example as i have a real networked API example i'm trying to speed up. i'm already using async and it's literally not any faster. so i don't know if aiohttp will be any better.

  • @varunvijaywargi5497
    @varunvijaywargi5497 9 หลายเดือนก่อน

    Can you please help on how the asyncio would work in AWS Lambda?

  • @redaoutarid6465
    @redaoutarid6465 2 ปีที่แล้ว

    Thanks for your helpful videos.
    Please, have you any idea how to avoid dat*ad*ome protection ?

  • @oparpax
    @oparpax 2 ปีที่แล้ว +1

    What if you need to render the content? What would be the best approach in that case?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      Requests-html I believe can do that but I’d have to check

    • @oparpax
      @oparpax 2 ปีที่แล้ว

      @@JohnWatsonRooney well that would be a nice topic for a new vid wouldn't i? :D

    • @danielgarcia1428
      @danielgarcia1428 2 ปีที่แล้ว

      @@oparpax He already did a video on that topic! its called Slow Web Scraper? Try this with ASYNC and Requests-html

  • @jamest4027
    @jamest4027 2 ปีที่แล้ว

    Hi John, I want to use Asynio and threading in combination. I want to use Asynio for making requests and threading for making calculations. What do you think?

    • @sebastiangudino9377
      @sebastiangudino9377 ปีที่แล้ว +1

      You could make a thread that runs a main loop (Inside of which async stuff is happening) while other threads are doing your calculations, so yeah, sure, they can both work
      But why tho? If you are already making a bunch of threads, why not also use threads for your network request? You are adding complexity to your code base by using both techniques when it seems that your use case would be perfectly fine with just one of them. Making your code slightly easier to maintain
      So yeah, you can use both of them, but do think about what you are trying to achieve, and if this will actually achieve that

  • @user-jj2bx7kt4d
    @user-jj2bx7kt4d 2 หลายเดือนก่อน

    how to do the same thing using GPU

  • @kushagraagrawal7292
    @kushagraagrawal7292 2 ปีที่แล้ว +1

    Hey I really like your color scheme! Could you please share the theme and color shemes used? Thanks

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      Sure this is gruvbox material and I’m using PyCharm community edition

  • @maxjackson6616
    @maxjackson6616 2 ปีที่แล้ว +1

    Also, I'm curious whats your day job? Is it related to web scraping?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      It’s not directly no but I use Python everyday for data extraction over our APIs

  • @rezanami3925
    @rezanami3925 ปีที่แล้ว

    Dear Ron,
    I am going to send unlimited fetch requests by javascript code from console to the server.
    In backend or frontend there is policy for each user to send request with 300 mili second time distance.
    If I want to send multiple requests in a second, I am blocked.
    Is there a solution for this issue from your point of view?

  • @volin_d
    @volin_d 2 ปีที่แล้ว +1

    That's a very neat trick! However is it safe to send so many requests in such a short period of time? I'm learning web scraping and from what I've read your IP can be banned by the website if it detects that all those requests are scripted. That's why in my web scraper I use sleep function to wait from 0 to 2 seconds (randomly) between each request so it resembles more human behavior. But I guess it's not the best solution since it takes about 15 minutes to scrape roughly 1000 webpages.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      Absolutely, if you are working on something that needs to spread out the requests the do so. This is about sending as many requests a possible- if you have 1000 different urls you wanted some data from then having to wait 1 by 1 would be painful, so we can use async to make them quicker. Also learning async technology is definitely worth it as you progress as a developer

    • @volin_d
      @volin_d 2 ปีที่แล้ว

      @@JohnWatsonRooney For sure. While I might not use async in this particular case it's good to know about it for the future

  • @techbystorm
    @techbystorm 2 ปีที่แล้ว +1

    We use asyncio to scrap around 300+ web pages of a site. Because of res = await asyncio.gather(*tasks) the res gets heavier and script stops there. We changed the logic to process 50 pages at a time using asyncio
    Also, I see you have not used loop = asyncio.get_event_loop(), does this affect the performance?

  • @user-dm9vr6ln3b
    @user-dm9vr6ln3b 6 หลายเดือนก่อน

    can i do the same things with Flask?

  • @ByteShadow
    @ByteShadow 2 ปีที่แล้ว

    What about running async bots that in turn run threads? 🤔

  • @thyagorcarvalho
    @thyagorcarvalho 2 ปีที่แล้ว

    That's so great! ow to do this with dynamic payload in a Post request?

  • @MichaelSchellerwayne
    @MichaelSchellerwayne 2 ปีที่แล้ว

    Hey John!!
    Could you make a video about handling cookies with aiohttp?
    With the ClientSession I can send cookies but they are shared with all instances. I just dont get it how to pass a cookie which is only shared with one website and how to retrieve them. Cookie handling with requests is way much easier, but i really want to use aiohttp because it is SO FAST! :D

  • @jambalaya974
    @jambalaya974 4 หลายเดือนก่อน

    Every devops developers nightmare.

  • @mehdinouri5530
    @mehdinouri5530 2 ปีที่แล้ว

    Can you please make a video on how to run a webdriver with headers on google colab? or if you have any tips
    Thank you
    I love your vids
    they really help a lot

  • @marakeeuh
    @marakeeuh 2 ปีที่แล้ว

    Hi Ron! I’m a business engineering student from Belgium currently writing my own thesis. As a part of my research I would need to make my own dataset by scraping Zalando. Would you be open to assisting me with this 1-to-1, ofcourse when publishing my work I would properly reference you and your help. I have tried some things using your very helpful videos, but I’m still mostly very stuck. I look forward to your answer

    • @renancatan4788
      @renancatan4788 ปีที่แล้ว

      still trying to do? it's possible to scrape this website in a very easy way. I'll be glad to help =]

  • @tomy7258
    @tomy7258 2 ปีที่แล้ว

    How to login in "jumia"

  • @sunjayjangam
    @sunjayjangam ปีที่แล้ว

    How to send million request?

  • @kirubababu9255
    @kirubababu9255 หลายเดือนก่อน

    Anybody summarize here, please

  • @saurabhjain2437
    @saurabhjain2437 3 หลายเดือนก่อน

    async for loop would have been more readable…

  • @osogrande4999
    @osogrande4999 11 หลายเดือนก่อน

    work in parallel with python threads? nope, GIL.

  • @oneofthechannelsofalltime
    @oneofthechannelsofalltime ปีที่แล้ว

    More polite title: How to do 2500 handshakes in 2 seconds and disappear.
    Next video: How to make 2500 people shake hands with each other in 2 seconds, for fun!
    require 'popcorn' // btw

  • @jhonatantechh
    @jhonatantechh ปีที่แล้ว

    Here is a good tutorial on how to destroy someone's backend in seconds :)

  • @kotslike
    @kotslike ปีที่แล้ว

    503 incoming

  • @randyjd3706
    @randyjd3706 2 ปีที่แล้ว +1

    First!!

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +2

      Nice 👍

    • @randyjd3706
      @randyjd3706 2 ปีที่แล้ว

      Really appreciate the videos! Cheers from Australia 🇦🇺

  • @MynamedidntFitDonkey
    @MynamedidntFitDonkey 10 หลายเดือนก่อน

    that's 2499 requests