Python Web Scraping: JSON in SCRIPT tags

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 ต.ค. 2024

ความคิดเห็น • 87

  • @boeingpete
    @boeingpete 3 ปีที่แล้ว +6

    Great video. Clear and concise. Been searching high and low, for ages, for a tutorial showing how to find my way down the tree to the data I wanted in a JSON object. Learnt a lot from this. Superb!! Deserves way more 'Likes' than it has.

  • @malwaredev33
    @malwaredev33 ปีที่แล้ว +3

    Nice content and amazing video John Bro, I learn multiple things from this video, especially getting JSON data from scriptTag. Thanks for making this type fruitful content

  • @KickAssRedBeard
    @KickAssRedBeard 3 ปีที่แล้ว +46

    just in case anyone else has the same issue I did. Sometimes when using '.text' to gather the text it returns no text at all. Instead you must use '.string' and you can still use '.strip()' with that if needed. I'm not a professional so I have no idea why that happens, but there you go :)

    • @antonylester4485
      @antonylester4485 3 ปีที่แล้ว +2

      Thanks.. had exactly that issue, I was lucky and the data I needed was contained in text, (soup.get_text().split())

    • @bagia1000
      @bagia1000 3 ปีที่แล้ว +3

      woww thanks for the tips! I have been looking for the answer why does using '.text' returns not text at all

    • @suyash.01
      @suyash.01 3 ปีที่แล้ว +1

      you saved me big time

    • @coleramos977
      @coleramos977 3 ปีที่แล้ว +1

      Lifesaver

    • @doggo104
      @doggo104 3 ปีที่แล้ว +1

      Saved my life

  • @hachinyesom1343
    @hachinyesom1343 2 ปีที่แล้ว +2

    Thanks alot John spent hours getting error until i saw your video ..you gained a subscriber

  • @suyash.01
    @suyash.01 3 ปีที่แล้ว +4

    THANKS MAN!! Finally something that is straightforward and works!!

  • @martpagente7587
    @martpagente7587 4 ปีที่แล้ว +1

    Thankyou so much sir. I don't know how to thank you.. I've been waiting this video for so long.. A big fan from Philippines❤❤❤. Please keep on creating great content like this

  • @kennyBrodriguez
    @kennyBrodriguez 4 ปีที่แล้ว +4

    Helped me through a challenging project. Thank you

  • @dhyeyadesai7773
    @dhyeyadesai7773 3 ปีที่แล้ว +2

    6:00 you can also use the replace method and replace the whole 58 letter string with "". Would have saved you the effort of counting :)

  • @nomcognom2332
    @nomcognom2332 2 ปีที่แล้ว +1

    It really helped me a lot. Thanks for posting it.

  • @TO-il3vc
    @TO-il3vc 4 ปีที่แล้ว +5

    Thanks for a great video! Can you think of another way to delete the useless data by using something other than [58:-1] ? This method definitely works, but it seems like a fragile solution that could fail if a future website code update changes this amount of spacing after the script header.
    Also, using .text did not work for me but .string did (for anyone having trouble).

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว +3

      Thanks for you comment! Yes you are right it isn’t the best solution. Another way would be to use regex and match the string that way but it is more complicated and not necessarily code change proof either, but more so than the method I used. Thanks for pointing out .string worked for you!

    • @briankiefer6162
      @briankiefer6162 4 ปีที่แล้ว +1

      Thank you so much. I was getting a blank response before I saw this comment. This should be pinned.

    • @stelvio5213
      @stelvio5213 3 ปีที่แล้ว

      @@JohnWatsonRooney thank you for your suggestion ".string" because i have the some problem.

  • @mgd8867
    @mgd8867 4 ปีที่แล้ว +1

    Thanks mate, saved me from messing up my uni project

  • @ghaithmoe9573
    @ghaithmoe9573 4 ปีที่แล้ว +1

    This video helped me a lot. Thanks John I appreciate it.

  • @whatisahandleeee
    @whatisahandleeee 3 ปีที่แล้ว +1

    You are a gentleman and a scholar.

  • @KIMIK0ful
    @KIMIK0ful ปีที่แล้ว +1

    Thank you so much. You save me. Really helpful video !!🤩

  • @SamKsl-yq8kn
    @SamKsl-yq8kn ปีที่แล้ว +1

    Thanks for the tutorial. Very helpful!

  • @dabisbesh
    @dabisbesh 3 ปีที่แล้ว

    You've just saved me lots of time - and a huge headache. Thank you.

  • @lesterwood4347
    @lesterwood4347 3 ปีที่แล้ว +1

    Thank you so much ! this was exactly what I was looking for

  • @rehmanyz9660
    @rehmanyz9660 4 ปีที่แล้ว +3

    Problem : when i write " script = soup.find_all('script')[4] " it works showing the json data but when i write " script = soup.find_all('script')[5].text.strip() " noting is showing. What is the problem?
    Thanks for the great tutorial.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว +1

      Using the index [5] will look for the next script tag in order and if there is not data there you won’t get anything back. In my example all the data was in one set of script tags

    • @rehmanyz9660
      @rehmanyz9660 4 ปีที่แล้ว

      @@JohnWatsonRooney sorry i did mistake in the first comment, in both index it is 4.
      but this one " script = soup.find_all('script')[4] " fetch entire script tag data
      and this one " script = soup.find_all('script')[4].text.strip() " didn't showing anything.

    • @stewielol
      @stewielol 4 ปีที่แล้ว +2

      If you're still struggling with this, instead of text, use string. I tried this method because I had the same issue as you

    • @aminebenlazreg
      @aminebenlazreg 4 ปีที่แล้ว +1

      @@stewielol Thanks man ! you saved my project

    • @nabilkhan487
      @nabilkhan487 3 ปีที่แล้ว +1

      @@stewielol thank you this helped me sir!

  • @KhalilYasser
    @KhalilYasser 3 ปีที่แล้ว +3

    I have a weird case when trying this line `script = soup.find_all('script')[4].text` if I omitted the `.text`, I can get the script correctly but when adding `.text`, I didn't get any output and the string of script is empty. Any ideas?
    ** I could solve that by using split like that `script = str(soup.find_all('script')[4]).split('initialReduxState = ')[1][:-15]` and that worked well.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Thats a great work around. Since doing this method more on some of my personal scripts i have noticed it requires a bit more effort to get it to work properly for you!

    • @KhalilYasser
      @KhalilYasser 3 ปีที่แล้ว

      @@JohnWatsonRooney You are the MASTER my bro. Thanks a lot for all the tutorials you offered. Can I have your email? Do you have a website?

    • @maxbezel
      @maxbezel 3 ปีที่แล้ว +1

      Thanks dude. This finally solved my frustration!

  • @hasithaeranga5589
    @hasithaeranga5589 2 ปีที่แล้ว +1

    Very helpfull and very clear

  • @eldino
    @eldino ปีที่แล้ว +1

    Hey, I worked with a similar website recently.
    Do you perhaps know WHY some websites embed the JSON content in instead of dynamically populate the page out of an API call?

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +1

      Hi. I never actually stopped to look it up - I figured it was a design pattern for one of the front end frameworks, I’m not into front end so much so I don’t actually know why!

    • @eldino
      @eldino ปีที่แล้ว

      @@JohnWatsonRooney Thank you for your kind answer! ☺️

  • @DittoRahmat
    @DittoRahmat 2 ปีที่แล้ว

    Hi John,
    Is possible to get this JSON script without loading the whole page ? Or maybe even get them from elsewhere ?
    My requests sometimes come up with 403 errors when scraping.

  • @tanzimat2039
    @tanzimat2039 2 ปีที่แล้ว

    Thank you for the content. Quite explanatory as always. Could you mind elaborating further on how we can iterate over the elements in the json object?

  • @gitgosc7075
    @gitgosc7075 2 ปีที่แล้ว

    one of the best videos, I had the same problem, thanks a lot!

  • @mdfaiz4583
    @mdfaiz4583 2 ปีที่แล้ว

    loved it bro...your explaination is so intuitive..thanks a ton .man

  • @MuhammadHassan-sm1bf
    @MuhammadHassan-sm1bf 2 ปีที่แล้ว +1

    Best explained 👍

  • @reymartpagente9800
    @reymartpagente9800 4 ปีที่แล้ว

    Hello John, can you share your thoughts about what are the pros and cons in webscraping using tools like parsehub, octoparse ects. and coding manually using beautifulsoap in python

  • @thedestroyer7326
    @thedestroyer7326 3 ปีที่แล้ว +1

    Thank you very much. It was very useful.

  • @miss_tech
    @miss_tech 3 ปีที่แล้ว +2

    Project saver video . Ty

  • @sreekanthreddy1803
    @sreekanthreddy1803 ปีที่แล้ว +1

    How to get the index of the script tag dynamically because in my case I am getting required text in different script tags on daily basis

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว

      I would probably get each script tag and check to see if the first characters match what i was expecting, or just try to force parsing them using try and except.

  • @fathan6602
    @fathan6602 4 ปีที่แล้ว +1

    thankyou so much this is what i am looking for!

  • @atsource3143
    @atsource3143 2 ปีที่แล้ว

    Hi John, just wanted to know is there any way to scrap hidden div tags/elements using playwright, beautifulsoup etc?
    Thanks

  • @hirensolanki6581
    @hirensolanki6581 2 ปีที่แล้ว +1

    very useful thankyou sir

  • @АлександрКадыров-ч1ф
    @АлександрКадыров-ч1ф 3 ปีที่แล้ว +1

    Thanks! Very useful!

  • @Topvickygaming
    @Topvickygaming 3 ปีที่แล้ว +1

    what to do if script has id or other tags? i.e., (script id ='xxxxx" type='yyyy'

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      You can use css selectors to get the coolest tag - # I’d for ID. I did a video on Chompjs that shows this better!

  • @kelaskita6765
    @kelaskita6765 3 ปีที่แล้ว

    Great tutorial. but in a certain web that I tried I got this type of error when print(r.status_code) "SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1125)')))"
    could you please figure out why this happen.

  • @virtualcodeguy8730
    @virtualcodeguy8730 4 ปีที่แล้ว

    hi, thanks for the video. Can you tell me how can I extract data from script 1 in this video that is similar to what I am trying to do but I always get ("session: ... , element:.....")

  • @PhilDodici
    @PhilDodici 2 ปีที่แล้ว +1

    Thank you!

  • @himanshusingh7509
    @himanshusingh7509 ปีที่แล้ว

    In a time stamp 7:35
    I upload my json file
    So that i get error
    'Type' object is not subscriptable
    Tell me the solution of this

  • @adeelaashraf4843
    @adeelaashraf4843 3 ปีที่แล้ว

    Hi, thanks for such an informative tutorial. I am trying to scrap a website. However, my python code is giving an error of HTTP Error 522: Origin Connection Time-out . It was perfectly fine a couple of days ago but now showing this error.

  • @fabriciogama5401
    @fabriciogama5401 2 ปีที่แล้ว

    script type='text/javascript' with $(document).ready(function() it's possible?

  • @valeriejackson3502
    @valeriejackson3502 ปีที่แล้ว

    Hi! how i can do with a site that don't have de with the JSON? you have and email i cant sent you i have problem for scrap with pyhton the h1 because it is in a JSON

  • @shahzaddangrach5371
    @shahzaddangrach5371 4 ปีที่แล้ว

    while getting the text within the script tag the program result empty column
    libraries imported beautiful soup , json , requests
    command line
    all_script = soup.find_all('script')
    main_script = all_script[142]
    print(main_script.text)
    result
    enmpty

  • @informationtoallj
    @informationtoallj ปีที่แล้ว +1

    Thanks

  • @aminebenlazreg
    @aminebenlazreg 4 ปีที่แล้ว +1

    Thank you so much !

  • @w4s100
    @w4s100 3 ปีที่แล้ว +1

    nice. thanks!

  • @KhalilYasser
    @KhalilYasser 3 ปีที่แล้ว

    Amazing and wonderful as usual. Can you share the code or at least the url?

  • @maksgruv
    @maksgruv 4 ปีที่แล้ว +1

    Nice jobs!!!

  • @adityapandit7344
    @adityapandit7344 3 ปีที่แล้ว +1

    Can we done this task using scrapy.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Absolutely, check out my channel for some scrapy videos

    • @adityapandit7344
      @adityapandit7344 3 ปีที่แล้ว

      @@JohnWatsonRooney thansk your reply I m trying to find out your videos. Where you scrape the data from script tag using scrapy. But I m not able to find it. Can you please share the video link here. Thanks once again.

    • @adityapandit7344
      @adityapandit7344 3 ปีที่แล้ว

      @@JohnWatsonRooney This is done by using Beautiful soup but I want to complete it by using scrapy framework of python. Please share the video link here

  • @damnnigha1764
    @damnnigha1764 3 ปีที่แล้ว +3

    bro i use regular expression to simplify this task

  • @daddy_eddy
    @daddy_eddy 2 ปีที่แล้ว

    Thanks.
    Could you write web site in description under video.

  • @killbane1000
    @killbane1000 ปีที่แล้ว

    top G, thank you, but i found little bit more better solution:
    Example
    brand = re.search(r'"THENAMEOFSTRING":\s*"(\w+)"', script).group(1)

  • @siyam.88
    @siyam.88 4 ปีที่แล้ว +1

    Thanks a lot, sir. It helped me a lot

  • @HouseofCM23
    @HouseofCM23 4 ปีที่แล้ว +1

    Thanks so much!