Don't Start Web Scraping without Doing These First

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ม.ค. 2025

ความคิดเห็น • 61

  • @shoebshaikh6310
    @shoebshaikh6310 3 ปีที่แล้ว +21

    By far the best channel on TH-cam for web scraping ❤️

    • @amirahmed5905
      @amirahmed5905 ปีที่แล้ว +1

      agreed

    • @usr829
      @usr829 8 หลายเดือนก่อน

      agreed

  • @_domdge_687
    @_domdge_687 6 หลายเดือนก่อน +1

    been following you for a week and i learn so many tips. Thank you!

  • @tubelessHuma
    @tubelessHuma 3 ปีที่แล้ว +8

    My favorite tip: Parse Locally 👍🌹

  • @khaliqsalawou3092
    @khaliqsalawou3092 2 ปีที่แล้ว +3

    Thank you, John, the tips were really helpful. and I would love it if you can share more of this in the future.

  • @balazseduard4016
    @balazseduard4016 3 ปีที่แล้ว +5

    You are the best man. Much respect, keep up the good work, I learn a ton from you as a beginner

  • @DIY-Investors
    @DIY-Investors 3 ปีที่แล้ว +10

    John, that was a really helpful (top down) overview which I found very helpful. As a visual learner, I almost need a decision tree diagram to take me down the most appropriate route... thereby taking me to the right set of tools/ routines to use. It’s also helpful to have a video in the 7- 10 minute time range, to focus in on the particular topic in hand. 10 out of 10 from me! 👍

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Thank you I’m glad you enjoyed it 👍

  • @rtxmax8223
    @rtxmax8223 3 ปีที่แล้ว +1

    Your channel is too good for us scrapers!!!

  • @TheJdB21
    @TheJdB21 3 ปีที่แล้ว +3

    When building my scraper, I love to do it on a jupyter-notebook first so that I could separate the request and parse part of the program.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      yes thats a great way too - i personally never got into notebooks but i certainly see the appeal.

  • @theinstigatorr
    @theinstigatorr 3 ปีที่แล้ว +3

    Thank you I just completed my first scrapy project today

  • @stevefox42
    @stevefox42 3 ปีที่แล้ว +3

    Man!, I'm having so much fun learning from watching your videos.

  • @nurlansalkinbayev3890
    @nurlansalkinbayev3890 3 ปีที่แล้ว +1

    Hello John. Thanks for your tips.

  • @SAJO91
    @SAJO91 3 ปีที่แล้ว +5

    I think we need a video where you talk about all the challenges that will face us when scraping like blocking ip or problems caused by sending too many requests.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +3

      Yes good idea I’ve been thinking doing about a video like that

  • @higaj
    @higaj 3 ปีที่แล้ว +1

    Thank you for the great advice.

  • @l0remipsum991
    @l0remipsum991 3 ปีที่แล้ว +1

    Thanks for the tips!

  • @RenatoEsquarcit
    @RenatoEsquarcit 3 ปีที่แล้ว +1

    Top content as usual

  • @tnssajivasudevan1601
    @tnssajivasudevan1601 3 ปีที่แล้ว +1

    Great video Sir.

  • @techmumus6780
    @techmumus6780 3 ปีที่แล้ว +1

    Great video! Thanks!!

  • @RonWaller
    @RonWaller ปีที่แล้ว +1

    Thanks John, I have 2 questions...First, how do you download the HTLM with requests? I tried looking it up and didn't find the solution. Second, looking at the source, what are we suppose to be seeing? I have dont that but not sure what I am looking for.Thanks

  • @drac.96
    @drac.96 2 ปีที่แล้ว +1

    How would you recommend dealing with IFrames? Any tips to extract data from those easily?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      From previous experience you need to find the actual url that the is and use that, you can usually find it in the source

    • @drac.96
      @drac.96 2 ปีที่แล้ว

      @@JohnWatsonRooney Oh okay, so we have to visit the URL to get the content of the IFrame, sounds easy enough. Really appreciate the quick reply! I like your videos, they're very informative.

  • @bn_ln
    @bn_ln 2 ปีที่แล้ว

    Thanks for the great content, your channel is an excellent learning resource. May I ask for a starting suggestion for a project that involves authentication and downloading CSV and Excel files.

  • @jorgev4656
    @jorgev4656 3 ปีที่แล้ว

    hello john. i would like you make a video scraping linkedin without selenium. for search jobs. thanks

  • @Analyse_US
    @Analyse_US 3 ปีที่แล้ว

    Gold! Great channel.

  • @chiamaka2885
    @chiamaka2885 3 ปีที่แล้ว

    Wonderful videos you have. How can I select the columns I want to scrape. Maybe the the information I need is in column 1,2 and 4. How do I don that? Thank you

  • @eziola
    @eziola 8 หลายเดือนก่อน

    I'm starting to see #shadow-root elements that I don't know how to get into. Thoughts on these?

  • @ugwuanyiarinze5626
    @ugwuanyiarinze5626 3 ปีที่แล้ว

    I'm looking for a market place where people hire scrapers?

  • @codetechpro
    @codetechpro 2 ปีที่แล้ว +1

    Hey John Can you make a short crash course on phantom js?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      I’m afraid my js skills aren’t that good but I could look into it

  • @alikorloo8425
    @alikorloo8425 2 ปีที่แล้ว +1

    it helped mate. what lib do you recommend for parsing lxml/html? and ofcourse for async request.get (only) and request.post(rarely). minimal libs just to get the work done. in one of your vids u talked about selectolax, and request-html in this one. I only need those two functionalities I mentioned above(parsing, requests). much appreciate it.🙏🏼

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      Thanks - my go to now is httpx for requests and selectolax for parsing

  • @sujatapatil9152
    @sujatapatil9152 3 ปีที่แล้ว

    Hi John,can you please help to scrape the reviews from slicksdeals site for all the sublinks of a product..I have tried bit failed to do it... please help me

  • @BeSharpInCSharp
    @BeSharpInCSharp 3 ปีที่แล้ว

    Wonderful video. Do you have any on decision tree ?

  • @ALANAMUL
    @ALANAMUL 3 ปีที่แล้ว

    How to scrape site that have " Loade more " or "show more" Button.plaz show us example

    • @harigovind6706
      @harigovind6706 3 ปีที่แล้ว +3

      Show more button will probably have a href with it you can send request to that url

    • @ALANAMUL
      @ALANAMUL 3 ปีที่แล้ว

      @@harigovind6706 thanks

  • @gwulfwud
    @gwulfwud 3 ปีที่แล้ว

    Hey man, I have an e commerce site I'm trying to scrape and I found that one section of the page I'm trying to get calls an API post and it's paginated. With that said, will it be better to just go straight and call the data through the API on that part instead of scraping it off the page? Follow up, should I still use scrapy or in combination of bs4? One to load and scrape the page and the other one just for the post API call.

  • @Automatic-show
    @Automatic-show ปีที่แล้ว

    Tnx

  • @nimishabhide2950
    @nimishabhide2950 3 ปีที่แล้ว

    Why can’t I scrape most amazon sites?

  • @higheringai68
    @higheringai68 3 ปีที่แล้ว

    Thanks.

  • @ahmedgamalelkattan2231
    @ahmedgamalelkattan2231 3 ปีที่แล้ว

    We urgently need video about scraping from TripAdvisor using Selenium please 😀

  • @Lahmeinthehouse
    @Lahmeinthehouse 2 ปีที่แล้ว

    Nice video! What do you use for screen recording ?

  • @daniel76900
    @daniel76900 3 ปีที่แล้ว +1

    parsing locally...men....that was it!!!

  • @spicer41282
    @spicer41282 3 ปีที่แล้ว

    Hey John,
    Just recently sub'd...
    These are great tips!
    How about a separate vid for each one?
    Looking over your shoulder,
    The 1st one:
    What will You be looking for? Keeping an eye out for?
    Listening to your train of thought - while you're going through the motion/ process would be awesome!
    Hope you consider this request.

  • @dnyaneshctech7409
    @dnyaneshctech7409 3 ปีที่แล้ว

    Scrap location wise loaded content.... Please

  • @TiaDzn
    @TiaDzn 3 ปีที่แล้ว +1

    bell gang!

  • @surfcow
    @surfcow 3 ปีที่แล้ว

    Valuable advice from 50,000 ft, not the usual 500 ft.
    Don't just start coding. Stop, think, design, look harder.
    Do you really understand the specific details of the problem, or are you guessing?

  • @ankeet7x
    @ankeet7x 3 ปีที่แล้ว +1

    bell gang! (2)

  • @JOHNSMITH-ve3rq
    @JOHNSMITH-ve3rq 3 ปีที่แล้ว

    Bro if you're yanking 500k files saving them all in github is not ideal

  • @user-zj8id7kc1r
    @user-zj8id7kc1r 2 ปีที่แล้ว +1

    nice video. i use bs4 because a lot of your videos use bs4 and i try to adapt your examples to my projects. Could you do future video with more complex selectors please :) because i have a lot problem to adapt with something like that lol .