How to Parse HTML Tables to JSON With Python

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 ธ.ค. 2024

ความคิดเห็น •

  • @samibenhssan3121
    @samibenhssan3121 2 ปีที่แล้ว +8

    This channel is the best by far for learning web scraping but i think that it would be better to focus only on scrapy framework. Surely it is the best tool by far. It is complete, powerful and easy to learn.

  • @Kylbigel
    @Kylbigel 2 ปีที่แล้ว +1

    John, thank you for keeping the channel blessed with the micro lessons! It takes a lot to consistently put this level of quality out here. I appreciate you!

  • @janisvelbergs6394
    @janisvelbergs6394 2 ปีที่แล้ว +4

    Nice explenation for list comprehension. From your videos I have learned a lot of new things about web scraping and also Python. Thanks for your content.

  • @androidmod183
    @androidmod183 2 ปีที่แล้ว +2

    Great content, Last two months i kept learning python and web scraping thanks to you John i learned a lot and i am advancing day by day.
    Thank you and do keep at it

  • @r12bzh18
    @r12bzh18 2 ปีที่แล้ว +2

    So good to see how you do it! I am reading a book on web scraping, « Web scraping with Python » from Ryan Mitchell, and it’s great to watch your videos on the back of it, it all becomes clearer and alive. Thank you!

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +2

      Thanks, I’ve not seen that book before I’ll check it out

    • @r12bzh18
      @r12bzh18 2 ปีที่แล้ว +1

      @@JohnWatsonRooney which one would you recommend? Any personal favorite of yours?

  • @sandrasoniec1895
    @sandrasoniec1895 ปีที่แล้ว +1

    Thanks for sharing. Just started with web scraping and the data I need is typically presented in tables.

  • @rickhehe
    @rickhehe 2 ปีที่แล้ว +1

    dict(zip(list_a, list)) is neat. Thanks John!

  • @haidernadeem
    @haidernadeem 2 ปีที่แล้ว +1

    Hi John, I was wondering if you could make a video on how to bypass cloudfare when scraping a website? I've tried using the correct headers/cookies and continuously changing the user agent but I still can't seem to get past certain websites which use cloudfare

  • @silientlips
    @silientlips ปีที่แล้ว +1

    WTF! This tutorial is clear and easy to follow. I have subscribed and like the video. By the way, what are the use of dict() and zip()? I love list comprehension.

  • @gisleberge4363
    @gisleberge4363 2 ปีที่แล้ว +1

    Very neat and clean code...and as always, well explained 🙂

  • @iamkian
    @iamkian 2 ปีที่แล้ว +1

    Funny. I have just started experimenting with Json in Phyton. Your video will help me a lot.
    Thank you for your share.

  • @AndreSpecker
    @AndreSpecker ปีที่แล้ว

    Hello, how could I add 3 replace for columns 0, 2 and 3?

  • @franky12
    @franky12 ปีที่แล้ว +1

    Unfortunately the package "requests_html" seems to be not maintained anymore, no bugfixes or updates, last activity in the repo was 3 years ago... 😢😢😢

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +1

      Yeah I’ve moved on to httpx or requests, and selectolax to parse

  • @ZenoModiff
    @ZenoModiff 2 ปีที่แล้ว

    hello john can you make a video on scrapping world population data website please i tried but failed beacuse the span tag is constandly changing

  • @lalitchowdhary4238
    @lalitchowdhary4238 2 ปีที่แล้ว

    Can we create a project like WP Automatic plugin or like Scrapes, these are wordpress plugins that automatically scrap content post it on website. They also have scheduling option

  • @artmania1383
    @artmania1383 ปีที่แล้ว

    But this module is not fetching the html files from the local. How can I do that?

  • @tetricko
    @tetricko 2 ปีที่แล้ว

    when i use pandas my th is not becoming cols instead its generating 0, 1, 2 for cols and putting th and td as data rows
    why is it not putting th as headers?

  • @harryhindsight9845
    @harryhindsight9845 2 ปีที่แล้ว

    Great channel.
    May I ask - if you were interesting in closely monitoring infrequent changes to a website (e.g. the "company news" page of a company you own stock in), do you have a gut-feel as to the best way to go about it? I anticipate newly released news would have its own "page". Perhaps simplest approach to crawl the website every X hours, list all of the "links", and check if the list has changed.

  • @amazingmechskills
    @amazingmechskills 2 ปีที่แล้ว +2

    Thanks A lot john.

  • @simonknights7619
    @simonknights7619 2 ปีที่แล้ว +1

    Nice. Is it possible to somehow iterate/emunerate through each league and create various json ouputs as leaguename.json. All in one .py rather than say multiple queries. Thank you :)

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      Sure, we can use a list of league names and loop through them, using the name variable as the output file name

  • @gauthiervigouroux982
    @gauthiervigouroux982 2 ปีที่แล้ว

    Hi John, nice video and, more generally, channel :) I have tried to adapt and reproduce your project on another website. It was a dynamic on so I use render, but when I try to get the absolute links of the web page it returns 'set()' as many times as there is a link in the page. Did you ever have this problem?
    Have a good day :)

  • @javierjdaza
    @javierjdaza 2 ปีที่แล้ว +1

    Hi Jhon, big fan here. i gotta question, why do you stop using requests + bs4?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      I do still use bs4 but I just prefer the way requests-html works, and my preference is css selectors which is the standard. Although you can use them in bs4 - it just comes down to preference! Use which you like best

  • @heq3160
    @heq3160 2 ปีที่แล้ว

    Hey, i want to check when a changement is made in a webpage, and i was wondering
    Am i supposed to perform a request every x seconds, or is there a way to check when a changement is made ?

  • @Stelaninja
    @Stelaninja 2 ปีที่แล้ว

    So basically, use requests_html to scrape and save the data. Then create another project where you read it with pandas, if you want to analyze the data.

  • @shoebshaikh6310
    @shoebshaikh6310 2 ปีที่แล้ว +1

    Great video👍

  • @ferilukmansyah_dev
    @ferilukmansyah_dev 2 ปีที่แล้ว +1

    nice tutorial

  • @saadachab8425
    @saadachab8425 2 ปีที่แล้ว +1

    Very interesting🙏

  • @djyems1021
    @djyems1021 2 ปีที่แล้ว +1

    Hi John, why are you deleting my comments??

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      Hey, it’s not me - if you are posting a link TH-cam will delete it automatically I’m afraid

  • @tetricko
    @tetricko 2 ปีที่แล้ว

    this line gives an error:
    res = [dict(zip(tableheader,t)) for t in tabledata]
    ---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    Cell In [69], line 11
    8 tabledata = [[c.text for c in row('td')] for row in table('tr')]
    9 tableheader = [[c.text for c in row('th')] for row in table('tr')]
    ---> 11 result = [dict(zip(tableheader,t)) for t in tabledata]
    Cell In [69], line 11, in (.0)
    8 tabledata = [[c.text for c in row('td')] for row in table('tr')]
    9 tableheader = [[c.text for c in row('th')] for row in table('tr')]
    ---> 11 result = [dict(zip(tableheader,t)) for t in tabledata]
    TypeError: unhashable type: 'list'

  • @sandeepahuja2203
    @sandeepahuja2203 2 ปีที่แล้ว +1

    Thank you very much for taking effort to post such short & crisp videos on web scrapping. Nowadays, Twitter is catching up with loads of open source communities & helping each other. However, I couldn't find you over there.
    Would you be kind to share your twitter handle please?