Web Scraping Weather Data with Python

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024
  • Here's a beginner level web scraping tutorial for you, scraping weather data from google. I am using requests-html & python - this is my preferred html parsing library as it gives a simple to use syntax and access to CSS selectors to make extraing elements very easy.
    At the end of the video I talk a bit about why you should always use an API if there is one available for things like this, however scraping data this way for personal projects is just fine.
    If you wish to support me, you can do so with any of these links:
    Patreon: / johnwatsonrooney (NEW)
    Amazon UK: amzn.to/2OYuMwo
    Hosting: Digital Ocean: m.do.co/c/c7c9...
    Gear Used: jhnwr.com/gear/ (NEW)
    -------------------------------------
    Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
    -------------------------------------

ความคิดเห็น • 75

  • @KhalilYasser
    @KhalilYasser 2 ปีที่แล้ว +13

    Thank you very much. I had an issue with the result to be in English and I solved it by adding this to headers "Accept-Language": "en-US,en;q=0.9,ar;q=0.8"

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +3

      Great tip thanks for sharing 👍

    • @AnimGraphLab
      @AnimGraphLab 2 ปีที่แล้ว +3

      You can also change the language by adding query parameters to the URL: "gl": "us", "hl": "en", which translates to this: www.google.com/search?q=new+york+weather&gl=us&hl=en
      gl -> country
      hl -> language

    • @KhalilYasser
      @KhalilYasser 2 ปีที่แล้ว +2

      @@AnimGraphLab Thank you very much for the tip. I tried us but got the result in Fahrenheit, so I changed to uk and it worked well. Thanks a lot for the tip.

    • @AnimGraphLab
      @AnimGraphLab 2 ปีที่แล้ว +1

      @@KhalilYasser That's great! It was expected behavior since the USA temperature unit is Fahrenheit so it defaults to this unit instead of Celsius :)

  • @xilllllix
    @xilllllix 2 ปีที่แล้ว +13

    i am becoming so good at python and web scraping thanks to all your videos! please don't stop making them. your channel is literally the only one on YT dedicated to python and web scraping!!!

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +2

      Thanks not planning on stopping any time soon!

  • @thelife8836
    @thelife8836 ปีที่แล้ว +3

    The best and most underrated web scraping channel in TH-cam

  • @vasudev16180
    @vasudev16180 2 ปีที่แล้ว +6

    Hello John! We are so lucky to have you in the python community. Keep going, John👍

  • @Old_SDC
    @Old_SDC 2 ปีที่แล้ว +2

    Works great! Thank you so much! I’m making an AI for my room and this is perfect. I’m using python 3.9.0 for anyone wanting to know if it still works as if that version

  • @xZzzit
    @xZzzit 2 ปีที่แล้ว +2

    Great videos mate! I like how you keep it short and snappy, keep up the good work.

  • @decromax
    @decromax 2 ปีที่แล้ว +2

    Nice, back to VS Code then!
    My Raspberry pi weather station has had a spider crawling the met office twice a day on a CRON job for around 18 months now.
    Use it on a dashboard for comparison with the station data.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      That’s a great example for scraping and a pi. Yeah vs code… I prefer PyCharm but vs code is great for things like this

  • @loganfairbairn4605
    @loganfairbairn4605 ปีที่แล้ว +1

    This was a fantastic intro into webscraping, thanks a ton!

  • @igordc16
    @igordc16 2 ปีที่แล้ว +2

    Thank you John! Your video is clear and interesting to watch as always.

  • @sunghin
    @sunghin 2 ปีที่แล้ว +1

    7:41 John says the selector is too vague, but actually ids are unique on the page, so no need to chain selectors there.

  • @Jucrisr
    @Jucrisr 11 หลายเดือนก่อน +1

    hi, It worked fine for the first time on colab. but when I ran the same code again, its showing temperature in Fahrenheit. Nothing is changed in the code. I am in london

    • @JohnWatsonRooney
      @JohnWatsonRooney  11 หลายเดือนก่อน +1

      I expect that’s to do with Colab. I think it’s changing the IP and you’ve need up getting a US version of the site giving you Fahrenheit. I always recommend running and writing code locally on your own compute

  • @bartoszjarczynski438
    @bartoszjarczynski438 2 ปีที่แล้ว +1

    Wow, found everything I needed thanks, keep videos coming

  • @davidwisemantel5041
    @davidwisemantel5041 2 ปีที่แล้ว +2

    Nice one John. Would be cool to see how you'd put those locations in a list and iterate in a for loop and save your results :)

  • @maybenew7293
    @maybenew7293 ปีที่แล้ว +1

    Smooth and simple.

  • @bisratgetachew8373
    @bisratgetachew8373 2 ปีที่แล้ว +2

    Thanks John! Great video once again!!

  • @redaoutarid6465
    @redaoutarid6465 2 ปีที่แล้ว +1

    How to bypass request API limit of send.
    I use VPN and proxy but I want to scrape 36k of pages. So I need 13k of IP's minimum.
    Can you share with me a best solution for this case.

  • @tiffaniejohnson352
    @tiffaniejohnson352 7 วันที่ผ่านมา

    Trying this in 2024 using python 3.12.5 and I’m getting the error lxml.html.clean module now separate project lxml_html_clean. Does this mean requests_html doesn’t work anymore?

  • @randlyce
    @randlyce ปีที่แล้ว +1

    is there a reason why you do the first one with span and then the second one with div class? can't you just use span for every single one? or maybe just use div for the first one as well?

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว

      yes you can use whichever selectors you find easiest/quickest to write

  • @bextla3737
    @bextla3737 6 หลายเดือนก่อน

    Thanks for tutorial. Is it possible to get previous data? Not currently
    May you share code link or in txt?

  • @fhkdhkdyidyhfufufh9011
    @fhkdhkdyidyhfufufh9011 ปีที่แล้ว

    Are there sites that are not accepted
    l try scrap Google news to scrap all news but It gives me an empty parenthesis

  • @datag1199
    @datag1199 2 ปีที่แล้ว +1

    Great tutorial. Thank you

  • @nrg-8044
    @nrg-8044 2 ปีที่แล้ว +1

    look instead of typing particular city name just put city=("current location") so it will search automatically instead of editing or input the city=" " value

  • @z4ar
    @z4ar ปีที่แล้ว +1

    How do I scrape a temperature from a certain day of the week? All the div names are the same, though "data-wob-di"s values differ but I don't know how to tell html.find that

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +1

      You could use find_all() which returns a list, then index that list

    • @z4ar
      @z4ar ปีที่แล้ว

      ​@@JohnWatsonRooney Where do I put this? After html.find?

  • @utkarshpandya3155
    @utkarshpandya3155 2 ปีที่แล้ว

    Hi John.Great video as usual & looking forward to seeing you a solution that I have emailed you as a use-case.Thanks for python online learning.

  • @mollywelsh3897
    @mollywelsh3897 2 ปีที่แล้ว

    What's up with the error: 'r is not defined' then importing requests as r, then 'requests module has no s attribute' ?

  • @oliaskrfa8588
    @oliaskrfa8588 ปีที่แล้ว

    why when im going to run it is says no module named 'request_html'?

  • @stanTrX
    @stanTrX 3 หลายเดือนก่อน

    Thanks good man

  • @ferilukmansyah3037
    @ferilukmansyah3037 2 ปีที่แล้ว +1

    thanks for great tutorial

  • @juanignaciolopezlopez45
    @juanignaciolopezlopez45 2 ปีที่แล้ว +2

    Hi John, I've been following for a long time. You do a great job. I've been trying to scrape alibaba in different ways. Could you make a video about alibaba?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +2

      Thanks! I will look into it it’s not something I’ve tried before!

  • @user-uf8vm5mu9s
    @user-uf8vm5mu9s 2 ปีที่แล้ว

    Hi! I am a student from Taiwan .I have some question in Web Scraping .Why my VC code can not intercept correct date.It always
    intercept "None".Could you help me?

  • @haithemamir223
    @haithemamir223 2 ปีที่แล้ว

    hi john ..
    i have a question please ...
    how i show the output of the python on html
    i didn't find how ..
    thanx

  • @Loygebaguio
    @Loygebaguio ปีที่แล้ว

    i cant import htmlsession on pycharm. can anyone know how to?

  • @raghavkerur5983
    @raghavkerur5983 2 ปีที่แล้ว +1

    the code is saying i dont have request_html module, help with this please

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      Sure did you pip install it? Pip install requests-html

    • @raghavkerur5983
      @raghavkerur5983 2 ปีที่แล้ว

      @@JohnWatsonRooney yes sir! Done!

  • @cuneytozkurt4867
    @cuneytozkurt4867 2 ปีที่แล้ว

    Hi, it's really cool. How can you scrape gold prices in google? I guess you use a diffirent method.🤔

  • @yurimatos1894
    @yurimatos1894 2 ปีที่แล้ว +2

    Hi John, first of all, i want to thank you for all the awesome content you make, i've been watching a lot of your videos recently and they improved my code so much.
    So, recently i've been trying to scrape a website to get the links for all the images in it, the thing is, the website is mostly dynamic, and the images are loaded using JS. Then i found your video on API endpoint, its a real game changer, i found the request that the website makes to the server to get all the links back at once, but the request-url has a different '?key=' each time i load it. Is there any way to bypass this?
    Thanks again!

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      Hey Yuri! Thanks very kind. I’m not sure I’ve seen the key thing before that you mention- but as it’s after the “?” It’s an api parameter - does it work without? If not you’d need to find out where it’s coming from to try to replicate it

    • @yurimatos1894
      @yurimatos1894 2 ปีที่แล้ว

      @John Watson Rooney yes, it's after the '?' at the end, without it all I get is response code 500. Do you have any tips on how could I find where it's coming from?
      For now what I've been able to do is use selenium-wire to capture the request headers and copy the key from there. But it's kinda pointless since i can get the response doing the same thing. My goal was to don't use selenium at all.
      Once again, your videos are really awesome!

  • @Slver009
    @Slver009 2 ปีที่แล้ว +1

    what theme for VS Code is john using in this video?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      This is Gruvbox Material I think

    • @Slver009
      @Slver009 2 ปีที่แล้ว +1

      @@JohnWatsonRooney i found it, its similar to the Night Shift mode in iPhones and I like it!!!
      Thanks John and keep up the great content!

  • @Pradeep_prasad
    @Pradeep_prasad 2 ปีที่แล้ว

    Sir please tell me ,how to avoid "press and load button" Robot or not checking 😐😐😐

  • @weslensilva2647
    @weslensilva2647 2 ปีที่แล้ว +1

    Very good

  • @sriramkasu7511
    @sriramkasu7511 2 ปีที่แล้ว

    Hey John can u make a video on how to add Auth Proxy to Firefox using selenium python please

  • @rahulkumarvarma6178
    @rahulkumarvarma6178 2 ปีที่แล้ว

    Can I use that for my website which I will host.

  • @mr.strange7002
    @mr.strange7002 2 ปีที่แล้ว

    Hey john your videos are really great and you teach web scrapping really well.. but i need your help in one thing i have a url to do post request I used requests and setup required headers but when i run my code i got 403 response and printing the text of it i noticed that cloudflare is blocking my request is there any way to fix it... The url is from a application which i got using MITM attack.. i need this cause i want some automation on my account inside that app.

  • @sriramkasu7511
    @sriramkasu7511 2 ปีที่แล้ว

    hey John, could you help me in solving this error during web scraping a website,
    Access denied | "website" used Cloudflare to restrict access

    • @sriramkasu7511
      @sriramkasu7511 2 ปีที่แล้ว

      I tired adding headers, and also tried using proxies as well, but nothing worked

  • @manikandanmanickam9433
    @manikandanmanickam9433 2 ปีที่แล้ว

    Hai John, you are amazing. Can you teach me? How to scrap the data from Delta Airlines

  • @kaxar6954
    @kaxar6954 ปีที่แล้ว

    Do you provide this as a service? Like to WP bloggers

  • @kaustubhmokal1053
    @kaustubhmokal1053 2 ปีที่แล้ว

    No module named requests_html

    • @jesperingo
      @jesperingo 2 ปีที่แล้ว

      run this in cmd: pip3 install requests-html

  • @wylde780
    @wylde780 2 ปีที่แล้ว +1

    I love f-string

  • @Saiyan412
    @Saiyan412 13 วันที่ผ่านมา

    Didn't work

  • @sujayvikramgs8588
    @sujayvikramgs8588 ปีที่แล้ว

    Google are you a bot thing