Honestly, I want to say you are the best. You explained it in a very simple way for those who have no in-depth knowledge in scripting. Thanks very much for this
Thank you so much! You just saved my butt for a class project! And excellent job explaining everything! As for further analysis, it'd be cool to see a video on extracting common words in reviews. Or extract user sentiment from comments on a social media site.
Woohoo! I'm happy to hear that! Thank you for the feedback. That is a pretty cool idea. I'd love to incorporate more NLP in my videos, and your idea could be a start. Great suggestion :)
Great video! Im having issues though, I am using a different link and have a different CSS element but same format. When I run the code I get some review dates but I also get other info like the city of the restaurant, and a photo count (i.e., [8]) Im kind of stumped as to why I am getting some other random info.
Hi Chris, the code is not posted anywhere. I'm thinking of making an update video on this. It seems like there is a better way of scraping this data. Once I share that, I'll be sure to post the code for it as well.
@@SamerHijjazi Thank you! I have been getting by with the Yelp Academic data set for now but I do need to work on my scraping skills so I can answer some specific research questions in the future.
Since the &*10-20...70 only started with the second page. he could of course just create the sequence starting from 1 (times 10 = 10), but that's an ok approach. imo this could have been easier with css selectors, and there were some extra str_* functions but all and all, great tutorial
@@SamerHijjazi Sometimes on some websites, we click on the page number, the content changes, but the page address is fixed and does not change. That is, the address is not a function of the page number. What should we do in this situation?
Great tutorial! Everything went well but Yelp suddenly banned my IP address. Any advice on how to remedy that? Also, for the ratings, the number of star ratings was more than the text and dates. So I couldn't really consolidate everything in the df framework. :(
Thank you for the feedback! I would suggest to put pauses in your code to prevent Yelp from banning your IP address. If you're still banned, change your IP via a VPN while running this script. That's the tricky part, sometimes, there may be extra ratings such as ones from Ads that make the number of reviews/dates not equal to the star ratings. Having even replies from the owner to reviews can create another imbalance. Scraping from Yelp was certainly tricky. I'd like to revisit it again and see if I can provide a more optimal scraping solution.
Honestly, I want to say you are the best. You explained it in a very simple way for those who have no in-depth knowledge in scripting. Thanks very much for this
This is amazing. As a newbie to R videos like these motivate me to learn more.
Thank you! I'm glad to hear that 😊
Why using xpath instead of css selector for classes? seems a bit tidius?
Thank you so much! You just saved my butt for a class project! And excellent job explaining everything! As for further analysis, it'd be cool to see a video on extracting common words in reviews. Or extract user sentiment from comments on a social media site.
Woohoo! I'm happy to hear that! Thank you for the feedback. That is a pretty cool idea. I'd love to incorporate more NLP in my videos, and your idea could be a start. Great suggestion :)
Great video! Im having issues though, I am using a different link and have a different CSS element but same format. When I run the code I get some review dates but I also get other info like the city of the restaurant, and a photo count (i.e., [8])
Im kind of stumped as to why I am getting some other random info.
Hi Chris, thank you for the feedback. I am currently working on an update video which will eliminate these issues. Stay tuned for it!
@@SamerHijjazi you’re awesome. Subbed to your channel and looking forward to more of your videos!
can be interested to start a tf-idf using the ratings as documents, and get the most meaningful words per rating
Thank you -- this is super helpful. By chance, do you have the code posted anywhere?
Hi Chris, the code is not posted anywhere. I'm thinking of making an update video on this. It seems like there is a better way of scraping this data. Once I share that, I'll be sure to post the code for it as well.
@@SamerHijjazi Thank you! I have been getting by with the Yelp Academic data set for now but I do need to work on my scraping skills so I can answer some specific research questions in the future.
This is amazing. Thank you!
This is amazing.I have a Q,why you didnt write all cods for i==0?
Since the &*10-20...70 only started with the second page. he could of course just create the sequence starting from 1 (times 10 = 10), but that's an ok approach. imo this could have been easier with css selectors, and there were some extra str_* functions but all and all, great tutorial
Thank you so much!How can i get this script?
Thanks in a million.
Web scraping you are best
Thank you for your excellent video 👌👌
What we do if click on page numbers don't change the address?
Thank you! I'm not sure I understand your question.
@@SamerHijjazi
Sometimes on some websites, we click on the page number, the content changes, but the page address is fixed and does not change. That is, the address is not a function of the page number. What should we do in this situation?
@@aminroshani This is when a package like RSelenium comes handy, as it will allow you to select the next button.
kindly share code repo
Great tutorial! Everything went well but Yelp suddenly banned my IP address. Any advice on how to remedy that?
Also, for the ratings, the number of star ratings was more than the text and dates. So I couldn't really consolidate everything in the df framework. :(
Thank you for the feedback! I would suggest to put pauses in your code to prevent Yelp from banning your IP address. If you're still banned, change your IP via a VPN while running this script. That's the tricky part, sometimes, there may be extra ratings such as ones from Ads that make the number of reviews/dates not equal to the star ratings. Having even replies from the owner to reviews can create another imbalance. Scraping from Yelp was certainly tricky. I'd like to revisit it again and see if I can provide a more optimal scraping solution.
You can find the updated tutorial here: th-cam.com/video/UlBNf8g1wI8/w-d-xo.html
Can you do for LinkedIn too