How I Scrape Amazon Reviews using Python, Requests & BeautifulSoup

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ก.ย. 2024
  • Another fun project today where we look at scraping product reviews from Amazon. This could be a useful project for someone or just a learning excerise. We use Splash to render the page for us and return the raw HTML which we can parse and extract the information we want with BeautifulSoup.
    Code: github.com/jhn...
    Splash Video: • Scrape Javascript with...
    -------------------------------------
    Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
    -------------------------------------
    Sound like me:
    microphone: amzn.to/36TbaAW
    mic arm: amzn.to/33NJI5v
    audio interface: amzn.to/2FlnfU0
    -------------------------------------
    Video like me:
    webcam: amzn.to/2SJHopS
    camera: amzn.to/3iVIJol
    lights: amzn.to/2GN7INg
    -------------------------------------
    PC Stuff:
    case: amzn.to/3dEz6Jw
    psu: amzn.to/3kc7SfB
    cpu: amzn.to/2ILxGSh
    mobo: amzn.to/3lWmxw4
    ram: amzn.to/31muxPc
    gfx card amzn.to/2SKYraW
    27" monitor amzn.to/2GAH4r9
    24" monitor (vertical) amzn.to/3jIFamt
    dual monitor arm amzn.to/3lyFS6s
    mouse amzn.to/2SH1ssK
    keyboard amzn.to/2SKrjQA

ความคิดเห็น • 80

  • @Oracle643
    @Oracle643 18 วันที่ผ่านมา +1

    Wow, this was incredibly helpful! It took me three days, but I finally figured it out. I’m not sure why, but even when I follow a TH-cam tutorial step-by-step, I sometimes end up with a different result at first.

  • @adamnowicki1425
    @adamnowicki1425 3 ปีที่แล้ว +6

    Hi John, I hope your channel grows as you are really good, I like how you go straight into the subject . I am learning stuff to try to get software tester job and you have really good tutorials for Bsoup (and many other things), all the best :)

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Thanks Adam appreciate that!

    • @_hindu_warriors
      @_hindu_warriors ปีที่แล้ว

      @@JohnWatsonRooney
      Sir can u help me plzz
      i m getting error like
      Getting page: 1
      0
      Getting page: 2
      0
      Getting page: 3
      0

    • @VipinKumaarr
      @VipinKumaarr ปีที่แล้ว +1

      @@JohnWatsonRooney : It does not pick up the reviews from the last page, as it finds the disabled next button and break. Any edit you suggest to still pick up reviews, so that those final page reviews does not get skipped

  • @invinci8032
    @invinci8032 3 ปีที่แล้ว +7

    Hi John, your videos are amazingly instructive and at the same time your videos motivate me to learn something new. Thank you!

  • @rand5858
    @rand5858 3 ปีที่แล้ว +2

    I looked for a "proper" explanation on the loop for web scrapping for many days; thanks! you hit it out of the park; cheers!

  • @03-bhavsarshivani54
    @03-bhavsarshivani54 ปีที่แล้ว +1

    heyy... the way you are explaining all the things it's really very easy and awsm... it's really helpful to me.... Thanks !!!! :)

  • @chloesong1792
    @chloesong1792 2 ปีที่แล้ว +1

    Wow Thank you so much!!! You are my idol! The best teacher! I was struggling for a few days. Thank you for your videos! You save me^~^ I am gonna watch all of your videos. 🌹

  • @huzaifaameer8223
    @huzaifaameer8223 3 ปีที่แล้ว +1

    Quality content worth watching!💚 Kindly keep going and also please make a video on how to perform a CRON-JOB for a scraping script on a live server and how to handle a script if it breaks due to some internet issue or etc!

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Thank you! Yes I am working on a new project of which the end will be running the script on a remote server via a crown job! All coming soon

    • @huzaifaameer8223
      @huzaifaameer8223 3 ปีที่แล้ว

      @@JohnWatsonRooney appreciated for your concern man really appreciated!💚

  • @Agung-yk7hr
    @Agung-yk7hr 2 ปีที่แล้ว +1

    learn a lot from your channel man 😁

  • @thulasirao9139
    @thulasirao9139 3 ปีที่แล้ว +1

    Thank you John, Simple way of explanation. This is what I was looking for. I struck to get all pages. I was getting error. Helpful.

  • @KumarS-tc6dl
    @KumarS-tc6dl ปีที่แล้ว +1

    Thanks for the video with useful information

  • @tubelessHuma
    @tubelessHuma 3 ปีที่แล้ว +1

    Really helpful John. Thanks

  • @tahirullah4786
    @tahirullah4786 3 ปีที่แล้ว +1

    Hi John, your videos are amazingly, but sir I have one question ! How can I extract a specific reviews comments of products, for example I collects those comments who's leave the amazon app due to some reasons(interface is not good or too many bugs or too much ads)....Please guide me or share ur any contact??

  • @quynhnlp5086
    @quynhnlp5086 3 ปีที่แล้ว +1

    just hit like is not enough, thank you for great work =))

  • @wafaal-dyani8997
    @wafaal-dyani8997 2 ปีที่แล้ว +1

    thanks brother your explanation is awesome

  • @gabijavaisvilaite8272
    @gabijavaisvilaite8272 2 ปีที่แล้ว +1

    Hi! Could you explain in more depth why in both cases you got fewer reviews than there were actually on amazon? Does it have something do with the reviews being in a different language? Also, how to deal with that situation then?

  • @SuperAless112
    @SuperAless112 3 ปีที่แล้ว +1

    How can I track how many requests I am sending? Is it one request per title, per product..., per item I am extracting, or is it one request per search-page-url I am opening?

  • @ahmedshamoon12
    @ahmedshamoon12 3 ปีที่แล้ว +1

    One of the best Video..... I love it

  • @overfitted
    @overfitted ปีที่แล้ว

    You mentioned a text processing tutorial in this video but I don't see it in your video list. Did you end up making one for the review text body?

  • @edcoughlan5742
    @edcoughlan5742 3 ปีที่แล้ว +1

    Another great video!

  • @Kirikira0798
    @Kirikira0798 3 ปีที่แล้ว +1

    Hey John it's really cool stuff you just made..when will you make the analysis of this data video?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Hey! Thanks - I am a bit behind on doing that, I am working on it though!

    • @Kirikira0798
      @Kirikira0798 3 ปีที่แล้ว

      @@JohnWatsonRooney yea awesome

    • @Kirikira0798
      @Kirikira0798 3 ปีที่แล้ว

      @@JohnWatsonRooney hey can you help me make my college project it's about sentiment analysis of customer reviews... only if you can .. great content by the way

  • @manojraj9996
    @manojraj9996 3 ปีที่แล้ว +1

    Hi John, I'm your sincere follower, your videos are absolutely instructive and amazing.
    With reference to this video,
    Can you please tell me how can we scrape the Date of the Review along with the other data?
    Thanks in Advance!

  • @pypypy4228
    @pypypy4228 ปีที่แล้ว

    Amazingle awesome!!! I am a huge fan of your vids! Is there a way to run it in google colab?

  • @billynoyes6388
    @billynoyes6388 3 ปีที่แล้ว +1

    Hey bro, really good video! How could I modify this so that I can see if a new review is posted?

  • @GuidoOlijslager
    @GuidoOlijslager 3 ปีที่แล้ว +1

    Hi John, another very instructive video. Are you also considering videos with webscraping through javascript en puppeteer?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Hi Guido - I’d like to branch out into JavaScript but the truth is I don’t know it very well so eventually yes, but for now just python

  • @AsifaAkter
    @AsifaAkter ปีที่แล้ว

    Hi, this code is not working for permission issues. Can you make a new tutorial on this

  • @husainakareem3700
    @husainakareem3700 3 ปีที่แล้ว +1

    Hi John, this is such an amazing scraping tutorial. I also watched your proxy setting tutorial. Can you please give me an idea of how to set a proxy server with this tutorial (I am new in Python and my coding skill is very basic).

  • @neves0001
    @neves0001 ปีที่แล้ว

    thank you for tutorial. i have a question i am working off my own yelp reviews and I keep getting out put none. i have tried all tag, and classes. i can read the url but when do find_all then print I get output of NONE.

  • @smilynnzhang9859
    @smilynnzhang9859 3 ปีที่แล้ว +1

    Thanks for another great video, John. But when I did it, I can get the all the review contents without using Splash. just by using requests and beautifulsoup. Is it supposed to be like this now on amazon? Much appreciated. Cheers.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Really, thats good to know. I used Splash as when I tried I wasnt able to get all the data I wanted without it.

    • @abhishekg066
      @abhishekg066 3 ปีที่แล้ว

      Can you please help me or share the code.

  • @gamingsociety5370
    @gamingsociety5370 3 ปีที่แล้ว +1

    interesting are you also considering videos with scrapping through graph api's and proxies?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Yes I am, I want to do more api stuff. Proxies I have a video on where I test some free ones

  • @neelpatel6364
    @neelpatel6364 2 ปีที่แล้ว

    i tried on your same code for different product but it can't get other page review. it just took page 1 data. is there any solution ?

  • @johnburks4702
    @johnburks4702 3 ปีที่แล้ว +1

    Hi Jon great work

  • @bhagyashreemourya7071
    @bhagyashreemourya7071 2 ปีที่แล้ว

    hey hi John, you explained it so nicely. While running the code i got stuck where it was returning ( Getting page:1 0) i dont understand where i have gone wrong ,can you help me out ,please

  • @Adaeze_ifeanyi
    @Adaeze_ifeanyi ปีที่แล้ว

    Mine keeps saying reviewlist not defined what do i do?

  • @poojabasutiya4225
    @poojabasutiya4225 2 ปีที่แล้ว +1

    Very useful Video

  • @alisiraydemir
    @alisiraydemir 2 ปีที่แล้ว

    mmmh, that's good! talking about your videos...

  • @AjiPirjian
    @AjiPirjian 3 หลายเดือนก่อน

    🔥🔥🔥

  • @vishnu.a.p881
    @vishnu.a.p881 3 ปีที่แล้ว +1

    very nice explanations

  • @سیدمهدیهاشمی-ل1ط
    @سیدمهدیهاشمی-ل1ط 3 ปีที่แล้ว

    Great job really useful videos but can you do a full tutorial on how to figure out what is site's defences and how to find the say around it ??

  • @nagehan519
    @nagehan519 2 ปีที่แล้ว

    why we use splash? for what? ty ♥

  • @manss2107
    @manss2107 3 ปีที่แล้ว

    Hello! I'd like to say grateful finding this video bcs you help my project so much. But i got error
    in print(f'Getting page: {x}'), they said invalid syntax. Could you help me please to resove this problem?
    Thank you🙏🏻

  • @justkaws304
    @justkaws304 3 ปีที่แล้ว +1

    How come you arent getting blocked by Amazon here?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Amazon have changed their website since I did this video, so I don’t think this specific code works. I’ve left the video up as the general idea is still good

  • @ManishKumar-br5sf
    @ManishKumar-br5sf 3 ปีที่แล้ว +2

    nice video

  • @regularsizedpanda
    @regularsizedpanda 3 ปีที่แล้ว +1

    Nice mannnnn

  • @vrajsavani
    @vrajsavani 3 ปีที่แล้ว +1

    Nice Video

  • @tahirullah4786
    @tahirullah4786 3 ปีที่แล้ว

    or recommend any ur videos using python?

  • @ayguncalskan5532
    @ayguncalskan5532 3 ปีที่แล้ว

    Hi John,Can I only get comments from the country I'm in?
    Examples of comments from the turkey amazon.com.tr

  • @bizonkids9396
    @bizonkids9396 ปีที่แล้ว

    Love the tutorial, I've used it to scrape over a 150k reviews now. Problem is, Splash is very fast at first but starts to slow down and timeout after a while. It looks like the web pages are piling up and it can't keep up. Do you have a solution for this?

    • @_hindu_warriors
      @_hindu_warriors ปีที่แล้ว

      bro can u help me plzz
      i m getting error like
      Getting page: 1
      0
      Getting page: 2
      0
      Getting page: 3
      0

    • @bizonkids9396
      @bizonkids9396 ปีที่แล้ว

      @@_hindu_warriors Remove the Try and Except blocks. The scraper will stop after page 1 but you'll see where the error occurs. Let me know what the error message is!

    • @_hindu_warriors
      @_hindu_warriors ปีที่แล้ว

      @@bizonkids9396 thank u bro code(which is in video) is running well now after removing try and except but why it was giving error earlier

    • @_hindu_warriors
      @_hindu_warriors ปีที่แล้ว

      @@bizonkids9396 also I m trying to scrap another website but it gave me error same as above like getting page ...0 after removing try and except also what can I do now

    • @bizonkids9396
      @bizonkids9396 ปีที่แล้ว

      @@_hindu_warriors You can not just copy paste code. This code is specifically for Amazon and you need to have a basic understanding of Python to adapt some of the code in case of errors. You might want to follow some Python courses first

  • @KendaBeatMaker
    @KendaBeatMaker 3 ปีที่แล้ว +1

    first!

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Haha nice

    • @KendaBeatMaker
      @KendaBeatMaker 3 ปีที่แล้ว +1

      @@JohnWatsonRooney hey man I've been learning alot from your videos, i love requests-html it's really hard for me to move on since learning this 2 weeks ago lol
      So i was able to get the reviews location with this:
      reviews = r.html.find('div', containing='review')
      Would you say I can keep using requests-html? i really don't mind it being a little slower.
      I'm forever poking at websites with Req-html, it's so easy

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Sure use what works for you! I like to explore different options but at the end of the day that’s what they, options!

    • @KendaBeatMaker
      @KendaBeatMaker 3 ปีที่แล้ว +1

      @@JohnWatsonRooney So, I don't mean to bother you but this is my issue and how i finally fixed it, maybe it can be something you teach cause i almost gave up.
      I didn't want my scraper running on my computer, I wanted it on something like AWS or Google Cloud Platform (free VM)
      It's all Linux command line, no GUI and wasn't getting any option to scrape Javascript sites. Couldn't install google chrome/selenium(planned to run this in headless mode) NO LUCK! I saw one complicated thing on Stackoverflow that didn't work for me nor the others that tried it.
      What i'm really trying to say is i took bits and pieces of the tuts you made so far and came up with my own work flow.
      I render the javascript with requests-html then send it to Beautifulsoup
      It works perfectly on the cloud no need to install anything crazy, yes it's probably twice as slow compared to the other ways, but i don't mind.
      Here is a link to what my code looks like not sure if it will show up. - i.imgur.com/IDfdr5e.png
      Also your channel is gonna grow like crazy you should probably setup a discord server.
      Thanks for everything you did and the stuff to come.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Hi Ken that’s a great solution! Love it. I am working on some content for running scrapers in the cloud and I will include a solution like this - I’m glad you got it working, nice one! John