Great example. Just one thought. You start with a known named page. For me, it would be more useful to use the web site index and loop through all pages.
This is a well planned out guide Frank! Well done. Can I ask you to create a video on scraping multiple google search queries, taking the first result link from each query? I can't find anything like this anywhere. Then you could also write the results to a DF / CSV.
Thanks! It seems something you can easily accomplish with Selenium. You’d have to use the send_keys() method to introduce text on Google and then scrape the results. Check out my Selenium tutorial. I think I don’t use the send_keys() method there but it still can help.
Thanks Frank, that is a great video! However it did work at first for me I needed to add 'encoding="utf-8"' in 'with open(f'{title}.txt', 'w', encoding="utf-8") as file:' to be able to print in the file, not sure why ;)
It has to do with the letters used in the transcript. Not all the letters are in the standard English alphabet, so you need to specify utf-8 as encoding to read them. That’s the most popular encoding all over the internet.
Good question. My recommendation is to start with Beautiful Soup then Selenium and then Scrapy. I’m going to create a playlist soon to make things easier for all of you!
Hey team - I had a few errors when running the code: Traceback (most recent call last): File "/titanic.py", line 18, in file.write(transcript) File "cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\ufb02' in position 31207: character maps to
Great example. Just one thought. You start with a known named page. For me, it would be more useful to use the web site index and loop through all pages.
Thank you Frank for your beautiful lesson
This is a well planned out guide Frank! Well done. Can I ask you to create a video on scraping multiple google search queries, taking the first result link from each query? I can't find anything like this anywhere. Then you could also write the results to a DF / CSV.
Thanks! It seems something you can easily accomplish with Selenium. You’d have to use the send_keys() method to introduce text on Google and then scrape the results.
Check out my Selenium tutorial. I think I don’t use the send_keys() method there but it still can help.
Thanks Frank for Cheat Sheet PDF, i really" need for learn deep python.
Thanks Frank, that is a great video! However it did work at first for me I needed to add 'encoding="utf-8"' in 'with open(f'{title}.txt', 'w', encoding="utf-8") as file:' to be able to print in the file, not sure why ;)
It has to do with the letters used in the transcript. Not all the letters are in the standard English alphabet, so you need to specify utf-8 as encoding to read them. That’s the most popular encoding all over the internet.
Thanks to you for this tutorial, learned alot.
You’re welcome!
OMG haha, I love the fact that you have the same video in englsih and Spanish.
Thanks boys..Huge fan of your content
What video about scrapping do you reccommend to see first? this or the previous? thx!
Good question. My recommendation is to start with Beautiful Soup then Selenium and then Scrapy. I’m going to create a playlist soon to make things easier for all of you!
@@ThePyCoach Thank you !
Hi, Thanks for video. Can u suggest approach to follow to learn Data Science.
I’m about to publish on Medium a Python for Data Science roadmap. Stay tuned!
Have you any video on email scraping?
Amazing bro
How I can rewrite this data before I upload it to my website. Please record video how we can rewrite the data after scrap it
It seems the Python Cheat Sheet on Web Scraping in no longer available!!
why are you parsing the website with xml?
nice
what IDE is this ?
It’s Pycharm, but feel free to use any Python text editor or IDE you want
@@ThePyCoach do you recommend any that has also autocomplete ?
please start from how to install python
Hey team - I had a few errors when running the code:
Traceback (most recent call last):
File "/titanic.py", line 18, in
file.write(transcript)
File "cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufb02' in position 31207: character maps to
This charmap error has to do with the encoding. Check the other answers or my GitHub to see how to deal with it
Just write this --> encoding='utf8'
Inside with open function