This channel is the best by far for learning web scraping but i think that it would be better to focus only on scrapy framework. Surely it is the best tool by far. It is complete, powerful and easy to learn.
John, thank you for keeping the channel blessed with the micro lessons! It takes a lot to consistently put this level of quality out here. I appreciate you!
Nice explenation for list comprehension. From your videos I have learned a lot of new things about web scraping and also Python. Thanks for your content.
Great content, Last two months i kept learning python and web scraping thanks to you John i learned a lot and i am advancing day by day. Thank you and do keep at it
So good to see how you do it! I am reading a book on web scraping, « Web scraping with Python » from Ryan Mitchell, and it’s great to watch your videos on the back of it, it all becomes clearer and alive. Thank you!
Hi John, I was wondering if you could make a video on how to bypass cloudfare when scraping a website? I've tried using the correct headers/cookies and continuously changing the user agent but I still can't seem to get past certain websites which use cloudfare
WTF! This tutorial is clear and easy to follow. I have subscribed and like the video. By the way, what are the use of dict() and zip()? I love list comprehension.
Can we create a project like WP Automatic plugin or like Scrapes, these are wordpress plugins that automatically scrap content post it on website. They also have scheduling option
when i use pandas my th is not becoming cols instead its generating 0, 1, 2 for cols and putting th and td as data rows why is it not putting th as headers?
Great channel. May I ask - if you were interesting in closely monitoring infrequent changes to a website (e.g. the "company news" page of a company you own stock in), do you have a gut-feel as to the best way to go about it? I anticipate newly released news would have its own "page". Perhaps simplest approach to crawl the website every X hours, list all of the "links", and check if the list has changed.
Nice. Is it possible to somehow iterate/emunerate through each league and create various json ouputs as leaguename.json. All in one .py rather than say multiple queries. Thank you :)
Hi John, nice video and, more generally, channel :) I have tried to adapt and reproduce your project on another website. It was a dynamic on so I use render, but when I try to get the absolute links of the web page it returns 'set()' as many times as there is a link in the page. Did you ever have this problem? Have a good day :)
I do still use bs4 but I just prefer the way requests-html works, and my preference is css selectors which is the standard. Although you can use them in bs4 - it just comes down to preference! Use which you like best
Hey, i want to check when a changement is made in a webpage, and i was wondering Am i supposed to perform a request every x seconds, or is there a way to check when a changement is made ?
So basically, use requests_html to scrape and save the data. Then create another project where you read it with pandas, if you want to analyze the data.
this line gives an error: res = [dict(zip(tableheader,t)) for t in tabledata] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In [69], line 11 8 tabledata = [[c.text for c in row('td')] for row in table('tr')] 9 tableheader = [[c.text for c in row('th')] for row in table('tr')] ---> 11 result = [dict(zip(tableheader,t)) for t in tabledata] Cell In [69], line 11, in (.0) 8 tabledata = [[c.text for c in row('td')] for row in table('tr')] 9 tableheader = [[c.text for c in row('th')] for row in table('tr')] ---> 11 result = [dict(zip(tableheader,t)) for t in tabledata] TypeError: unhashable type: 'list'
Thank you very much for taking effort to post such short & crisp videos on web scrapping. Nowadays, Twitter is catching up with loads of open source communities & helping each other. However, I couldn't find you over there. Would you be kind to share your twitter handle please?
This channel is the best by far for learning web scraping but i think that it would be better to focus only on scrapy framework. Surely it is the best tool by far. It is complete, powerful and easy to learn.
John, thank you for keeping the channel blessed with the micro lessons! It takes a lot to consistently put this level of quality out here. I appreciate you!
Thanks you, very kind
Nice explenation for list comprehension. From your videos I have learned a lot of new things about web scraping and also Python. Thanks for your content.
Hey thanks, I’m glad it’s helped
Great content, Last two months i kept learning python and web scraping thanks to you John i learned a lot and i am advancing day by day.
Thank you and do keep at it
So good to see how you do it! I am reading a book on web scraping, « Web scraping with Python » from Ryan Mitchell, and it’s great to watch your videos on the back of it, it all becomes clearer and alive. Thank you!
Thanks, I’ve not seen that book before I’ll check it out
@@JohnWatsonRooney which one would you recommend? Any personal favorite of yours?
Thanks for sharing. Just started with web scraping and the data I need is typically presented in tables.
dict(zip(list_a, list)) is neat. Thanks John!
Hi John, I was wondering if you could make a video on how to bypass cloudfare when scraping a website? I've tried using the correct headers/cookies and continuously changing the user agent but I still can't seem to get past certain websites which use cloudfare
WTF! This tutorial is clear and easy to follow. I have subscribed and like the video. By the way, what are the use of dict() and zip()? I love list comprehension.
Very neat and clean code...and as always, well explained 🙂
Funny. I have just started experimenting with Json in Phyton. Your video will help me a lot.
Thank you for your share.
That’s great I’m glad it helped
Hello, how could I add 3 replace for columns 0, 2 and 3?
Unfortunately the package "requests_html" seems to be not maintained anymore, no bugfixes or updates, last activity in the repo was 3 years ago... 😢😢😢
Yeah I’ve moved on to httpx or requests, and selectolax to parse
hello john can you make a video on scrapping world population data website please i tried but failed beacuse the span tag is constandly changing
Can we create a project like WP Automatic plugin or like Scrapes, these are wordpress plugins that automatically scrap content post it on website. They also have scheduling option
But this module is not fetching the html files from the local. How can I do that?
when i use pandas my th is not becoming cols instead its generating 0, 1, 2 for cols and putting th and td as data rows
why is it not putting th as headers?
Great channel.
May I ask - if you were interesting in closely monitoring infrequent changes to a website (e.g. the "company news" page of a company you own stock in), do you have a gut-feel as to the best way to go about it? I anticipate newly released news would have its own "page". Perhaps simplest approach to crawl the website every X hours, list all of the "links", and check if the list has changed.
Thanks A lot john.
Nice. Is it possible to somehow iterate/emunerate through each league and create various json ouputs as leaguename.json. All in one .py rather than say multiple queries. Thank you :)
Sure, we can use a list of league names and loop through them, using the name variable as the output file name
Hi John, nice video and, more generally, channel :) I have tried to adapt and reproduce your project on another website. It was a dynamic on so I use render, but when I try to get the absolute links of the web page it returns 'set()' as many times as there is a link in the page. Did you ever have this problem?
Have a good day :)
Hi Jhon, big fan here. i gotta question, why do you stop using requests + bs4?
I do still use bs4 but I just prefer the way requests-html works, and my preference is css selectors which is the standard. Although you can use them in bs4 - it just comes down to preference! Use which you like best
Hey, i want to check when a changement is made in a webpage, and i was wondering
Am i supposed to perform a request every x seconds, or is there a way to check when a changement is made ?
So basically, use requests_html to scrape and save the data. Then create another project where you read it with pandas, if you want to analyze the data.
Great video👍
Thanks!
nice tutorial
Very interesting🙏
Thanks for watching!
Hi John, why are you deleting my comments??
Hey, it’s not me - if you are posting a link TH-cam will delete it automatically I’m afraid
this line gives an error:
res = [dict(zip(tableheader,t)) for t in tabledata]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [69], line 11
8 tabledata = [[c.text for c in row('td')] for row in table('tr')]
9 tableheader = [[c.text for c in row('th')] for row in table('tr')]
---> 11 result = [dict(zip(tableheader,t)) for t in tabledata]
Cell In [69], line 11, in (.0)
8 tabledata = [[c.text for c in row('td')] for row in table('tr')]
9 tableheader = [[c.text for c in row('th')] for row in table('tr')]
---> 11 result = [dict(zip(tableheader,t)) for t in tabledata]
TypeError: unhashable type: 'list'
Thank you very much for taking effort to post such short & crisp videos on web scrapping. Nowadays, Twitter is catching up with loads of open source communities & helping each other. However, I couldn't find you over there.
Would you be kind to share your twitter handle please?
Sure it’s @jhnwr