How to Parse HTML Tables to JSON With Python

John Watson Rooney

มุมมอง 14 207

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 31 ธ.ค. 2024

ความคิดเห็น •

@samibenhssan3121 2 ปีที่แล้ว ⁺⁸
This channel is the best by far for learning web scraping but i think that it would be better to focus only on scrapy framework. Surely it is the best tool by far. It is complete, powerful and easy to learn.
@Kylbigel 2 ปีที่แล้ว ⁺¹
John, thank you for keeping the channel blessed with the micro lessons! It takes a lot to consistently put this level of quality out here. I appreciate you!
@JohnWatsonRooney 2 ปีที่แล้ว ⁺¹
Thanks you, very kind
@janisvelbergs6394 2 ปีที่แล้ว ⁺⁴
Nice explenation for list comprehension. From your videos I have learned a lot of new things about web scraping and also Python. Thanks for your content.
@JohnWatsonRooney 2 ปีที่แล้ว ⁺¹
Hey thanks, I’m glad it’s helped
@androidmod183 2 ปีที่แล้ว ⁺²
Great content, Last two months i kept learning python and web scraping thanks to you John i learned a lot and i am advancing day by day.
Thank you and do keep at it
@r12bzh18 2 ปีที่แล้ว ⁺²
So good to see how you do it! I am reading a book on web scraping, « Web scraping with Python » from Ryan Mitchell, and it’s great to watch your videos on the back of it, it all becomes clearer and alive. Thank you!
@JohnWatsonRooney 2 ปีที่แล้ว ⁺²
Thanks, I’ve not seen that book before I’ll check it out
@r12bzh18 2 ปีที่แล้ว ⁺¹
@@JohnWatsonRooney which one would you recommend? Any personal favorite of yours?
@sandrasoniec1895 ปีที่แล้ว ⁺¹
Thanks for sharing. Just started with web scraping and the data I need is typically presented in tables.
@rickhehe 2 ปีที่แล้ว ⁺¹
dict(zip(list_a, list)) is neat. Thanks John!
@haidernadeem 2 ปีที่แล้ว ⁺¹
Hi John, I was wondering if you could make a video on how to bypass cloudfare when scraping a website? I've tried using the correct headers/cookies and continuously changing the user agent but I still can't seem to get past certain websites which use cloudfare
@silientlips ปีที่แล้ว ⁺¹
WTF! This tutorial is clear and easy to follow. I have subscribed and like the video. By the way, what are the use of dict() and zip()? I love list comprehension.
@gisleberge4363 2 ปีที่แล้ว ⁺¹
Very neat and clean code...and as always, well explained 🙂
@iamkian 2 ปีที่แล้ว ⁺¹
Funny. I have just started experimenting with Json in Phyton. Your video will help me a lot.
Thank you for your share.
@JohnWatsonRooney 2 ปีที่แล้ว
That’s great I’m glad it helped
@AndreSpecker ปีที่แล้ว
Hello, how could I add 3 replace for columns 0, 2 and 3?
@franky12 ปีที่แล้ว ⁺¹
Unfortunately the package "requests_html" seems to be not maintained anymore, no bugfixes or updates, last activity in the repo was 3 years ago... 😢😢😢
@JohnWatsonRooney ปีที่แล้ว ⁺¹
Yeah I’ve moved on to httpx or requests, and selectolax to parse
@ZenoModiff 2 ปีที่แล้ว
hello john can you make a video on scrapping world population data website please i tried but failed beacuse the span tag is constandly changing
@lalitchowdhary4238 2 ปีที่แล้ว
Can we create a project like WP Automatic plugin or like Scrapes, these are wordpress plugins that automatically scrap content post it on website. They also have scheduling option
@artmania1383 ปีที่แล้ว
But this module is not fetching the html files from the local. How can I do that?
@tetricko 2 ปีที่แล้ว
when i use pandas my th is not becoming cols instead its generating 0, 1, 2 for cols and putting th and td as data rows
why is it not putting th as headers?
@harryhindsight9845 2 ปีที่แล้ว
Great channel.
May I ask - if you were interesting in closely monitoring infrequent changes to a website (e.g. the "company news" page of a company you own stock in), do you have a gut-feel as to the best way to go about it? I anticipate newly released news would have its own "page". Perhaps simplest approach to crawl the website every X hours, list all of the "links", and check if the list has changed.
@amazingmechskills 2 ปีที่แล้ว ⁺²
Thanks A lot john.
@simonknights7619 2 ปีที่แล้ว ⁺¹
Nice. Is it possible to somehow iterate/emunerate through each league and create various json ouputs as leaguename.json. All in one .py rather than say multiple queries. Thank you :)
@JohnWatsonRooney 2 ปีที่แล้ว
Sure, we can use a list of league names and loop through them, using the name variable as the output file name
@gauthiervigouroux982 2 ปีที่แล้ว
Hi John, nice video and, more generally, channel :) I have tried to adapt and reproduce your project on another website. It was a dynamic on so I use render, but when I try to get the absolute links of the web page it returns 'set()' as many times as there is a link in the page. Did you ever have this problem?
Have a good day :)
@javierjdaza 2 ปีที่แล้ว ⁺¹
Hi Jhon, big fan here. i gotta question, why do you stop using requests + bs4?
@JohnWatsonRooney 2 ปีที่แล้ว
I do still use bs4 but I just prefer the way requests-html works, and my preference is css selectors which is the standard. Although you can use them in bs4 - it just comes down to preference! Use which you like best
@heq3160 2 ปีที่แล้ว
Hey, i want to check when a changement is made in a webpage, and i was wondering
Am i supposed to perform a request every x seconds, or is there a way to check when a changement is made ?
@Stelaninja 2 ปีที่แล้ว
So basically, use requests_html to scrape and save the data. Then create another project where you read it with pandas, if you want to analyze the data.
@shoebshaikh6310 2 ปีที่แล้ว ⁺¹
Great video👍
@JohnWatsonRooney 2 ปีที่แล้ว
Thanks!
@ferilukmansyah_dev 2 ปีที่แล้ว ⁺¹
nice tutorial
@saadachab8425 2 ปีที่แล้ว ⁺¹
Very interesting🙏
@JohnWatsonRooney 2 ปีที่แล้ว
Thanks for watching!
@djyems1021 2 ปีที่แล้ว ⁺¹
Hi John, why are you deleting my comments??
@JohnWatsonRooney 2 ปีที่แล้ว
Hey, it’s not me - if you are posting a link TH-cam will delete it automatically I’m afraid
@tetricko 2 ปีที่แล้ว
this line gives an error:
res = [dict(zip(tableheader,t)) for t in tabledata]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [69], line 11
8 tabledata = [[c.text for c in row('td')] for row in table('tr')]
9 tableheader = [[c.text for c in row('th')] for row in table('tr')]
---> 11 result = [dict(zip(tableheader,t)) for t in tabledata]
Cell In [69], line 11, in (.0)
8 tabledata = [[c.text for c in row('td')] for row in table('tr')]
9 tableheader = [[c.text for c in row('th')] for row in table('tr')]
---> 11 result = [dict(zip(tableheader,t)) for t in tabledata]
TypeError: unhashable type: 'list'
@sandeepahuja2203 2 ปีที่แล้ว ⁺¹
Thank you very much for taking effort to post such short & crisp videos on web scrapping. Nowadays, Twitter is catching up with loads of open source communities & helping each other. However, I couldn't find you over there.
Would you be kind to share your twitter handle please?
@JohnWatsonRooney 2 ปีที่แล้ว
Sure it’s @jhnwr

ต่อไป

เล่นอัตโนมัติ

HOW TO: JSON and APIs in PYTHON - A Beginners Look