Always Check for the Hidden API when Web Scraping

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 ก.ค. 2021
  • DISCORD (NEW): / discord
    If this method if available, its the best way to scrape data from site. I will show you how to find the API endpoint that we can use to directly get the JSON data that is being sent from the server, before JavaScript gets its mucky paws on it and makes it look like what we see in our browser. Its quick and simple, and with just a few extra tips and techniques it can transform your web scraping.
    Scraper API: www.scrapingbee.com/?fpr=jhnwr
    Patreon: / johnwatsonrooney
    Proxies: proxyscrape.com/?ref=jhnwr
    Hosting: Digital Ocean: m.do.co/c/c7c90f161ff6
    Gear I use: www.amazon.co.uk/shop/johnwat...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 530

  • @hectord.7107
    @hectord.7107 2 ปีที่แล้ว +384

    I've doing this for years as a self taught programmer, there are some little tricks you did here that i didn't know, thank you for the video.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +14

      Glad it was helpful!

    • @sworatex1683
      @sworatex1683 ปีที่แล้ว +9

      It's my first year in programming and there was nothing new actually. I don't even think that pain was worth it I'd just make the scraper in js and make it return a json string.

    • @sworatex1683
      @sworatex1683 ปีที่แล้ว +3

      But I guess that would be useless for bigger projects. I'd just do it in js if I want like an actual product list like this.

    • @gesuchter
      @gesuchter ปีที่แล้ว +4

      @D R lol

    • @mattshu
      @mattshu ปีที่แล้ว +5

      @D R how do you “block” scraping??

  • @Davidca12
    @Davidca12 ปีที่แล้ว +20

    This single-handedly cut the running time of my program from literal hours to a couple of minutes, cannot thank you enough!

  • @abc.2924
    @abc.2924 2 ปีที่แล้ว +76

    I've been using this trick for a while now, and I've learned it from you, so thanks. Amazing work man

  • @leow.2162
    @leow.2162 ปีที่แล้ว +9

    I'm not a very experience programmer, I've been doing it recreationally for like 2 years on and off but I did a lot of webscraping and this is just a really neat piece of knowledge that I wouldn't have come across own my own. Thanks.

  • @angestellterderantifagmbh
    @angestellterderantifagmbh 8 หลายเดือนก่อน +3

    My man, this is EXCACTLY what I was looking for. Had to do some extra steps, but a little try-and-error and basic understanding of HTTP was enough to solve my problem. Thank You!

  • @Kralnor
    @Kralnor 2 ปีที่แล้ว +3

    This is a true gold nugget. Thanks for demonstrating how to easily view the request in Insomnia and auto-generate code!

  • @shivan2418
    @shivan2418 ปีที่แล้ว +3

    This is, no joke, the most useful video I ever saw on TH-cam!

  • @zenon1903
    @zenon1903 2 ปีที่แล้ว +3

    Please ignore my first comment. I checked out your first video in this series and learned about using scrapy shell to test each line of code. With that I found the bug in my code. The code worked PERFECTLY as advertised. Your the man! Much thanks!

  • @huisopperman
    @huisopperman 5 หลายเดือนก่อน

    Thanks for sharing! This has helped me a lot. After struggling for weeks with selenium, I was able to apply this technique fairly quickly, and am now using it as source to scrape ETF-composition data to feed directly into a PowerBI dataset. Much appreciated!

  • @drkskwlkr
    @drkskwlkr ปีที่แล้ว +22

    Loved everything about this video! Great delivery style, production quality and interesting topic for me. First time visitor to this channel and not a Python user (thanks, TH-cam, for your weird but helpful predictive algorithms).

  • @ERol-du3rd
    @ERol-du3rd ปีที่แล้ว +5

    Awesome advice, a lot of people skip checking the requests when building scrapers but it can save a lot of time when it works

  • @tikendraw
    @tikendraw 2 ปีที่แล้ว +2

    I just want you to never stop creating such informative video. For god sake.

  • @wp4297
    @wp4297 ปีที่แล้ว +1

    HUGE! I've been looking for this info for 2 days. 12 mins of your video better than anything else, by far. Thumbs up and thank you so much

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว

      Thank you !!

    • @wp4297
      @wp4297 ปีที่แล้ว

      @@JohnWatsonRooney you saved me a lot of time. I'm new to the topic, next days I'll take a look at you channel

  • @fuad471
    @fuad471 ปีที่แล้ว +2

    really nice and helpful tips in an actual topic with a sight-pleasuring recording quality, thank you for your time and efforts.

  • @mujeebishaque
    @mujeebishaque 2 ปีที่แล้ว +2

    I love you, John. You're awesome! Thanks for being unique and producing quality content.

  • @user-fl8fy4wn1k
    @user-fl8fy4wn1k 2 หลายเดือนก่อน

    There is always something new to learn.
    I’ve been spending hours to grind such an information by hand-writing the whole program to get my result ;D
    Thanks!

  • @sheikhakbar2067
    @sheikhakbar2067 2 ปีที่แล้ว +2

    I always come to your channel for these excellent time-saving tips and tricks! Thank you!

  • @krahkyn
    @krahkyn 6 หลายเดือนก่อน

    This is such useful content that shows how much value experience gives - thank you for the straightforward and realistic tutorial!

  • @anduamlaktadesse9284
    @anduamlaktadesse9284 2 ปีที่แล้ว +1

    Hidden API is by far the easiest way to scrap a website!!!! Thanks bro!!! Big Clapping for you!!!!! I've followed all your procedures &finally i did it.

  • @sajayagopi
    @sajayagopi 2 ปีที่แล้ว +1

    I was struggling with selinium to extract a table from javascript website. This video saved so much time. Thank you

  • @BIO5Error
    @BIO5Error 2 ปีที่แล้ว +4

    Very, very interesting - I'm going to give this a go myself. Cheers for another great video John.

  • @ScottCov
    @ScottCov 2 ปีที่แล้ว +1

    John Great Video...Thanks for taking the time to do this!!!

  • @klabauter-ev4ix
    @klabauter-ev4ix 11 หลายเดือนก่อน +1

    That was incredibly helpful and exactly what I needed today. Your presentation is very clear. Thank you!

  • @marlinhicks
    @marlinhicks 7 วันที่ผ่านมา

    Been using python for a couple years now as a picked up language and I really appreciate getting to see how someone experienced approaches these problems

  • @gleysonoliveira802
    @gleysonoliveira802 2 ปีที่แล้ว +8

    This video was the answer to my prayers!
    The next best option was to watch an one hour video and hope they would teach what you taught... In 10 minutes!!! 👏👏👏

  • @GLo-qc8rz
    @GLo-qc8rz 4 หลายเดือนก่อน

    OMG man, was searching for 3 hrs how to extract javascript data w/o complicated rendering and your vid gave a 3 second solution. thank you so much man

  • @Oiympus
    @Oiympus ปีที่แล้ว +1

    nice tips, it's always fun to poke around and look at what data webpages are using

  • @joshuakb2
    @joshuakb2 ปีที่แล้ว +5

    This video came into my feed just a couple days after I used exactly this method to collect some data from a website. Very good info! This is much easier than web scraping. Unfortunately, in my case, the data I could get out of the API was incomplete, and each item in the response contained a URL to a page with the rest of the info I needed, so I had to write some code to fetch each of those pages and scrape the info I needed. But much easier than having to scrape the initial list, as well.

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +1

      Thanks! I’m glad it helped in some way. I often find that a combination of many different methods is needed to get the end result

  • @ThijmenCodes
    @ThijmenCodes ปีที่แล้ว +1

    Nice video! Used a similar method to collect European Court of Human Rights case documents since there is no official API. Glad to see such methods gaining popularity online, it’s so useful!

  • @phoenixflower1225
    @phoenixflower1225 ปีที่แล้ว +1

    Thank you so much - this is so insightful and educational. Really helped me understand so many things in so little time.

  • @mrklopp1029
    @mrklopp1029 2 ปีที่แล้ว +3

    Thank you for those videos. They're extremely helpful. Keep up the good work! 🙂

  • @Moiludi
    @Moiludi 6 หลายเดือนก่อน

    Thank you! It provided a new way of thinking at the issue of collecting data. 🙏

  • @Josh-kw7zk
    @Josh-kw7zk 5 หลายเดือนก่อน +1

    Thank you so much for this tutorial. It helped me a lot on my project. And i learn a lot of new things that i didnt know. Thank you!

  • @mattimhiyasmith
    @mattimhiyasmith ปีที่แล้ว +1

    I have used the inspect with network method but wasn't aware of the copy as url method, thanks for that tip will save me a lot of time!

  • @judgewooden
    @judgewooden ปีที่แล้ว +1

    I like how you regularly start sentences with 'you might think' assuming we are all idiots. I approve, glad smart people, like you, make time to explain to us plebs how the world works. Apprecated.

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว

      Hey, thanks. I do my best to explain things how I would have wanted to be taught

  • @transatlant1c
    @transatlant1c ปีที่แล้ว +118

    Nice video. It’s worth noting as well that many APIs will pageinate, so rather than checking how many total results exist and manually iterating over them - you just check to see if the ‘next page url’ or equivalent key exists in the results and if so, get that too until it doesn’t exist anymore, merging/appending each time until the dataset is complete 👍

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +15

      Yes you’re right thank you!

    • @ianroberts6531
      @ianroberts6531 ปีที่แล้ว +8

      In fact you can see at 05:33 that this particular API does just that - there's "nextPageURL" and "totalPages" at the end of the response JSON.

  • @Jiloh5
    @Jiloh5 ปีที่แล้ว +1

    It worked like charm! I really needed this. Thanks

  • @davida99
    @davida99 2 ปีที่แล้ว +1

    Wow! I just found a gem of a channel! Love your content!

  • @nadyamoscow2461
    @nadyamoscow2461 2 ปีที่แล้ว +2

    Clear and helpful as usual. Thanks a lot!!

  • @muhammadrehan3030
    @muhammadrehan3030 2 ปีที่แล้ว +1

    Thank you for such a wonderful videos. I learned a lot from you. BTW your video quality and background are always very beautiful.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      thanks! it's nice of you to mention video quality and background, i do my best!

  • @freekdl6648
    @freekdl6648 2 ปีที่แล้ว +138

    I rarely praise anything, but this tutorial was SO good! Well explained, no filler. In 7 or 8 minutes you guided me through finding the hidden information I needed, which tools I need to use and how to automate it. This tutorial gave me enough confidence to try to write my first Python script! Within hours I built a scraper that can pull all metadata for a full NFT collection from a marketplace. Without this video it would have taken days/weeks to discover all of this

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +12

      That awesome! thank you very kind!

    • @channul4887
      @channul4887 2 ปีที่แล้ว +1

      "In 7 or 8 minutes"
      More like 11

    • @freekdl6648
      @freekdl6648 2 ปีที่แล้ว +10

      @@channul4887 Nope! I had different goals so no need to follow the full tutorial

    • @MelonPython
      @MelonPython ปีที่แล้ว +2

      I even added it in my playlist. Great video. Definetely starting to love API's more and more.

    • @danielcardin9241
      @danielcardin9241 ปีที่แล้ว +1

      Because of this video, I was able to start my own rockets and satellites company. In only four hours, I started the company, launched thousands of rockets, and now I have my own interplanetary wireless intranet from which I can control the entire galaxy! Thanks again!

  • @rameshks5281
    @rameshks5281 2 ปีที่แล้ว +2

    Easy to understand and very neat & clean narration. Keep it up 🙂

  • @vinny723
    @vinny723 2 ปีที่แล้ว +3

    Great tutorial. My screen scrapping job went from 4.5 hours to 8 minutes!!!!!

  • @BasitAIi
    @BasitAIi 9 หลายเดือนก่อน +2

    This video is really amazing I learned web scraping from your videos
    thanks

  • @brockobama257
    @brockobama257 ปีที่แล้ว +2

    bro, you're a game changer and i love you. if i ever see you in person ill offer to buy you a beer, or lunch, coffee whatever

  • @Jacob-gy9bg
    @Jacob-gy9bg ปีที่แล้ว +4

    Wow, thanks for this excellent tutorial! I just spent all this time writing cumbersome Selenium code, when it turns out all the data I was looking for was already right there!

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว

      Great! That’s exactly what I was hoping to achieve with this video

  • @edgarasben
    @edgarasben ปีที่แล้ว +1

    This is amazing! So many things I didn’t know.

  • @phoenixflower1225
    @phoenixflower1225 ปีที่แล้ว +1

    This is seriously high level content right here

  • @unknownuser993
    @unknownuser993 ปีที่แล้ว +1

    Wow that ‘generate code’ feature is super useful. Thanks!

  • @inspecteurbane5666
    @inspecteurbane5666 ปีที่แล้ว +1

    Thanks a lot, very interesting video, i learned so many things that i didn't know. I will come back for sure!

  • @voinywolnyprod3046
    @voinywolnyprod3046 ปีที่แล้ว +1

    Quite interesting! Thank you so much for showing such nice tricks, gonna get familiar with Insomnia.

  • @davidl3383
    @davidl3383 ปีที่แล้ว +1

    brillant, i start to do that and it's very effective. Good chanel and good job. Thank you John

  • @bigstupidtree3771
    @bigstupidtree3771 ปีที่แล้ว +1

    This has saved me hours today.
    Genuinely, thank you. 🙇‍♂️

  • @ninjahkz4078
    @ninjahkz4078 ปีที่แล้ว +1

    Lol, I hadn't thought of a possibility to get an api like that until now haha ​​thanks a lot!

  • @pascal831
    @pascal831 8 หลายเดือนก่อน +2

    Thanks John! You are a lifesaver sir!

  • @aidanomalley8607
    @aidanomalley8607 ปีที่แล้ว +1

    Thank you, your videos has automated my job. All i need now is a AI cam of myself

  • @lucasmoratoaraujo8433
    @lucasmoratoaraujo8433 11 หลายเดือนก่อน +1

    Greetings from Brazil! Thank you! I just had to adjust some of the quote marks on the header (there were some 'chained' double quotes (like ""windows"")), making some of the header's strings be interpreted by python as code, not text. Just had to change inner double quotes for single quotes (e.g. "'windows'") and it worked perfectly!). Can't wait to try your other tutorials! Once more, thank your very much!

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 หลายเดือนก่อน

      Hey! Thank you! I’m glad you managed to get it work

  • @eakerz5642
    @eakerz5642 ปีที่แล้ว +1

    Tnx :) Went from 1 hour scraping with Selenium to 1 minute just getting the JSONs.

  • @lokeshchowdary7487
    @lokeshchowdary7487 ปีที่แล้ว +1

    Thank you for making this awesome tutorial.

  • @tubelessHuma
    @tubelessHuma 2 ปีที่แล้ว +1

    This is very useful trick John. 💖

  • @karthikshaindia
    @karthikshaindia 2 ปีที่แล้ว +1

    Very nice... I did earlier the same on an another site, that was bit tricky. This is very straight forwarded site. Meanwhile , Insomnia reduced more works ;). Thanks for an another great video

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      That’s great. Yes I picked this on because it was very easy, I think it helps people understand the core concepts better

  • @irfanshaikh262
    @irfanshaikh262 ปีที่แล้ว +1

    John, you make scraping interesting and motivating simultaneously.
    Good that I found your channel
    P. S. I lost myself it at 0:10😂

  • @JeanOfmArc
    @JeanOfmArc ปีที่แล้ว +1

    You have shown me the light. Thank you for stopping me from making more web scripts that load up web pages in browsers to click buttons.

    • @JeanOfmArc
      @JeanOfmArc ปีที่แล้ว

      I have tried this method, but sadly the site I am trying to scrape from returns "error": "invalid_client", "error_description": "The client credentials provided were invalid, the request is unauthorized." Am I out of luck?

  • @vinubalank
    @vinubalank 2 ปีที่แล้ว

    This has been a great help for a beginner like me.. Thanks a lot..

  • @tnssajivasudevan1601
    @tnssajivasudevan1601 2 ปีที่แล้ว +1

    Great information Sir, really helpful.

  • @philippe6724
    @philippe6724 ปีที่แล้ว

    Super well taught! Thank you Sir!

  • @giovannimilana6428
    @giovannimilana6428 ปีที่แล้ว +1

    Huge thanks this video was a game changer for me!

  • @elliotnyberg9332
    @elliotnyberg9332 ปีที่แล้ว +1

    This is amazing and will help me alot. Thank you!!

  • @tobias2688
    @tobias2688 2 ปีที่แล้ว +1

    Hi John, I loved the video so much that I had to join Patreon to subscribe there to you. Thanks!

  • @RS-Amsterdam
    @RS-Amsterdam 2 ปีที่แล้ว +3

    You made/make stepping into scraping and developing, easy and fun .
    Thanks for sharing !!

  • @randyallen8610
    @randyallen8610 ปีที่แล้ว +1

    Great content. Thanks for this video

  • @ericbwertz
    @ericbwertz ปีที่แล้ว +2

    Nice video -- perfect level of detail.

  • @just_O
    @just_O ปีที่แล้ว +7

    Nice tutorial on scrapping, some tricks I have been using myself, and some others never heard of until now thx for sharing!!!
    Small adjustments if I may (please don't take this as criticism) I think you don't need to loop over each product to copy it to your res, you can use extend instead, also I think the header didn't change so you can take it out the loop over pages

  • @SkySesshomaru
    @SkySesshomaru 2 ปีที่แล้ว +1

    This is gold man, thank you!
    Just WOW.

  • @user-eu6vy8sz4w
    @user-eu6vy8sz4w 6 หลายเดือนก่อน

    Great Stuff!

  • @TypicallyThomas
    @TypicallyThomas ปีที่แล้ว +1

    Thanks so much. This makes things a lot easier

  • @estring69
    @estring69 2 ปีที่แล้ว

    Thanks for the video. Not new to python, pandas, or API's, but do need to start scraping pages for the API as some are not published, or not documented well. Thanks.

  • @lagu1ful
    @lagu1ful ปีที่แล้ว +1

    thank you for the information that you have explained, this is very helpful for the research I am doing

    • @lagu1ful
      @lagu1ful ปีที่แล้ว

      thank you so much

  • @abrammarba
    @abrammarba 4 หลายเดือนก่อน

    Awesome! Thank you! 🙂

  • @rizkomao649
    @rizkomao649 2 ปีที่แล้ว +1

    Thank you, this is so inspiring.

  • @ahmedelsaid8368
    @ahmedelsaid8368 7 หลายเดือนก่อน +1

    amazing one ,thaks a lot

  • @thecodegod2434
    @thecodegod2434 ปีที่แล้ว +1

    Thanks a lot John!!!

  • @michelramon5786
    @michelramon5786 5 หลายเดือนก่อน +1

    I was like "hm, okay, yeah" to "HOLY SHIT, THATS THE DOPEST SHIT I'VE EVER SEEN"
    I'm starting to get into this niche and I intend to learn more Python and SQL (you know, Data Analysis stuff/jobs) and I'm doing a project to scrape NBA statistics but there are always some errors and it ends up taking a long time.
    BUT THIS IS GOLD CONTENT, KEEP IT UP

  • @BringMe_Back
    @BringMe_Back 2 ปีที่แล้ว +1

    Amazing ♥️, super helpful

  • @MikeD-qx1kr
    @MikeD-qx1kr 7 หลายเดือนก่อน +1

    John, a specific video about how to scrape React website would be nice. It uses a mix of html and JSON data on pages...just an idea. Keep up the good work loving it.

  • @cloudedcaesar
    @cloudedcaesar 8 หลายเดือนก่อน +1

    excellent video. Subscribed!

  • @gauravpainuly1800
    @gauravpainuly1800 2 ปีที่แล้ว +1

    Love your content

  • @yannikzeyer4847
    @yannikzeyer4847 ปีที่แล้ว +1

    Great video !

  • @bronxandbrenx
    @bronxandbrenx 2 ปีที่แล้ว +1

    My master in data extraction.

  • @HazelChikara
    @HazelChikara 2 ปีที่แล้ว +1

    You are the best sir!

  • @dmccalldds
    @dmccalldds 4 หลายเดือนก่อน

    Incredible!

  • @Rob-ky1ob
    @Rob-ky1ob ปีที่แล้ว

    Instead of looping over the list and doing an append of each individual item, you can do list().extend(list()) which extends the list with the new list. The result of this is 1 list of dictionaries (basically an identical result to how you did it) but with less and cleaner code.

  • @FireFly969
    @FireFly969 2 ปีที่แล้ว

    Amazing content and tricks

  • @mrhide5690
    @mrhide5690 ปีที่แล้ว +1

    Best! Thank you!

  • @codydabest
    @codydabest ปีที่แล้ว +1

    This was nearly exactly my job back in 2014/2015 for a giant e-com shoe company. Was always nice when you'd come across a brand that included their inventory count in their API.
    But yes selenium/watir all day lol

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว

      I’m often quite surprised how much info you can easily get!

  • @chadgray1745
    @chadgray1745 2 ปีที่แล้ว +3

    Thanks for this - and other - videos, John. Super helpful! Regarding the cookie expiring, can you suggest a way to use playwright to programmatically generate the cookie used on the API request? I am assuming that cookie isn’t the same as the cookie used for the request of the html but maybe that’s wrong?

  • @swaidaslam8266
    @swaidaslam8266 ปีที่แล้ว +1

    Wow, just did not know something like that existed. Thanks :)

  • @denissavast
    @denissavast ปีที่แล้ว +1

    Great job! Thank you very much!

  • @TheEkkas
    @TheEkkas ปีที่แล้ว +1

    Such a nice vid, if there was a VPN add, I didn't even notice it!

  • @ed7590
    @ed7590 ปีที่แล้ว +1

    Brilliant video thank you mate