Scrape Data from Booking.com using Python - HTML to Excel & CSV

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ม.ค. 2025

ความคิดเห็น • 47

  • @olumidepeter3456
    @olumidepeter3456 ปีที่แล้ว +2

    This is a better way than bs4. Good job

    • @AminBoutarfi
      @AminBoutarfi  ปีที่แล้ว

      I agree! It is better since playwright provides more functionalities. Basically simulating everything we can do on a browser, and it's lighter than Selenium. Thank you for your interest!

    • @healwhazy
      @healwhazy 3 หลายเดือนก่อน

      I swearrr this helped me so much! Thank youuuu

  • @unknown35514
    @unknown35514 11 หลายเดือนก่อน +3

    How can I scrape more data in each hotels? because there is a lot more data when you click every hotel listed in the search like most popular facilities, address etc

  • @ahassan7270
    @ahassan7270 7 หลายเดือนก่อน

    Thank you so much for sharing such valuable information. You are Genius.👏👏

  • @rrvbin6354
    @rrvbin6354 8 หลายเดือนก่อน

    Your videos are very interesting, so I followed you. It’s a shame that your last video was uploaded 11 months ago. This channel has potential to grow a lot more!

    • @AminBoutarfi
      @AminBoutarfi  8 หลายเดือนก่อน

      Thanks a lot @rrvbin6354 will be back very soon 🙏

  • @pavlos1016
    @pavlos1016 ปีที่แล้ว

    Also, you may face the issue of clicking on the accept cookie button, otherwise the banner will prevent you from scrapping some data. Especially if you want to go to the next page because the cookie banner hides the next page button. If you want the script to run automatically, you must automate the accept cookie button. In selenium, this is the script:
    driver.find_element(By.XPATH, '//button[contains(text(), "Accept")]').click()
    But you should wait for the browser to fully load in order for the cookie banner to pop up and then click the accept button.

    • @flaviacittadini5017
      @flaviacittadini5017 ปีที่แล้ว +1

      I am getting a timeout error with Amin's code and I think that's the reason why the output files are not being generated although it runs and print the number of results. Would you know why that is? (just asking because I noticed you clearly have a better domain on this than I do)

    • @AminBoutarfi
      @AminBoutarfi  ปีที่แล้ว

      Good observations! Will update the code to that.

    • @AminBoutarfi
      @AminBoutarfi  ปีที่แล้ว

      Will fix the code shortly, was away from TH-cam for a while.

    • @novotododia709
      @novotododia709 ปีที่แล้ว

      @@AminBoutarfi hello! just found out your video today, thank you for helping us! did you already fix this cookie error?

    • @novotododia709
      @novotododia709 ปีที่แล้ว

      @@flaviacittadini5017 I was having this problem, then i noticed it was because of the check in and out date, which were on april, and we are on july, so the url wasnt working. You gotta change the dates

  • @XiangyiZhu-k9m
    @XiangyiZhu-k9m ปีที่แล้ว +1

    Hi! this is an amazing tutorial! I have a one quick question, why only 30 hotels are scraped?

  • @p.a.8283
    @p.a.8283 9 หลายเดือนก่อน

    Can you tell us how to scrape the Stars rating? So the number of stars a hotel has?

  • @mohdkhaizurkhairuddin3830
    @mohdkhaizurkhairuddin3830 10 หลายเดือนก่อน +1

    Can we scrappe the reviews?

  • @santiagonegrotto2777
    @santiagonegrotto2777 4 หลายเดือนก่อน

    Could you do it to search rental cars ?

  • @alexmckinley79
    @alexmckinley79 ปีที่แล้ว +1

    You legend! Thank you!

  • @Rob.U
    @Rob.U 8 หลายเดือนก่อน

    Maybe because this is older and they have changed somethings but I'm getting an error trying to scrape with the price. When I comment out the code for the line of price it works just fine.. but of course that is a very important piece. How can I work around this?

  • @amineboussetta9391
    @amineboussetta9391 10 หลายเดือนก่อน

    The booking page asks me to log in everytime so the script doesn't work. Any solutions ? Thank you!

  • @flaviacittadini5017
    @flaviacittadini5017 ปีที่แล้ว +1

    Hi! Sorry, I am a REAL beginner. You included a Proxy in the comments but never mentioned it in the video. What should I do with that one?

    • @AminBoutarfi
      @AminBoutarfi  ปีที่แล้ว

      Hey, implementing proxies in your code depends on the provider. Usually you send a code or proxy numbers with the header of the request. Will make a special video about it.
      Proxy providers usually have documentation/ code examples, check that out.

  • @asuelkellm1515
    @asuelkellm1515 ปีที่แล้ว

    Hey really nice tutorial, thanks :)
    PS : How do you do when you want to scrape several city at the same time?

    • @AminBoutarfi
      @AminBoutarfi  ปีที่แล้ว

      Still didn't implement that yet! Will do it in future for sure. Right now you can add like a list of cities to the script and loop over them. The city is currently static in the URL (Paris). You need to make it dynamic

    • @Mangopa94ify
      @Mangopa94ify 11 หลายเดือนก่อน

      @@AminBoutarfi how do you make it dynamic? my goal would be to enter a precise location and search the hotels within X km. I'm struggling with that

  • @itspacenews
    @itspacenews ปีที่แล้ว +1

    So how can we scrape for more data??

    • @AminBoutarfi
      @AminBoutarfi  ปีที่แล้ว +1

      You need to go through multiple pages (deal with pagination). You have 2 options:
      1- tell Playwright to click on the next button bellow each time (google how to click buttons using playwright. Very easy)
      2- Booking as of now Booking.com use the "&offset=" in the URL for pagination purposes. If you go to page 2, you would find that the URL is the same as page 1, the only difference is that: "&offset=25" is added, and for page 3 "&offset=50" and so on. Loop over multiple pages since now we just need the first URL, and add "&offset= ..." each time, and scrape data.
      Hope it helps!

    • @flaviacittadini5017
      @flaviacittadini5017 ปีที่แล้ว

      Hi! Could you make it? I tried the second option that Amin suggested but I can only scrap 2 pages at a time and then it will timeout :/

    • @novotododia709
      @novotododia709 ปีที่แล้ว

      @@AminBoutarfi how can i get to know how many pages are for a specific location, so I can loop for a specific number of pages

  • @SmartTech-m1u
    @SmartTech-m1u 4 หลายเดือนก่อน

    brillant one

  • @EduardStaudinger
    @EduardStaudinger ปีที่แล้ว +1

    Hi there, really amazing tutorial, thank you so much for this!
    I've got a bit of an issue, though:
    Whenever I launch the script, it never creates the Excel/CSV files.
    It prints out the amount of hotels within the console, though.
    But I think it crashes after that, because it also doesn't close the browser window.
    Do you know what might cause this issue?

  • @jenchendiadeguzman8142
    @jenchendiadeguzman8142 ปีที่แล้ว

    Hello Amin, can you teach how to scrape data from booking flight website to Excel? Thank you! 😊

  • @dr.python4113
    @dr.python4113 ปีที่แล้ว +1

    I'm the first commenter. I really like this video.

    • @AminBoutarfi
      @AminBoutarfi  ปีที่แล้ว +1

      Thank you! I really appreciate it!

  • @eldarkadric349
    @eldarkadric349 ปีที่แล้ว

    How can we scrape all hotel URLs?

    • @AminBoutarfi
      @AminBoutarfi  ปีที่แล้ว +1

      Not sure if I understood, but this script will get you data from the first page only. You need to add a pagination mechanism in place, Will do that in future!

    • @motivational-speech-
      @motivational-speech- ปีที่แล้ว +1

      @@AminBoutarfi I mean scrape galery photos inside each hotel and other data