What I'd Add FIRST To a new Scrapy Project

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 พ.ย. 2024

ความคิดเห็น • 66

  • @linuxinstalled
    @linuxinstalled 3 ปีที่แล้ว +6

    I wish this video had more exposure. I greatly appreciate that you took the time to put this series together. Being able to see these examples of the various mechanics behind scrapy has been hugely helpful. Thank you again.

  • @janekstern
    @janekstern 2 ปีที่แล้ว +3

    You videos helped me understand scrapy more than any other resource, ty!

  • @davyroger3773
    @davyroger3773 3 ปีที่แล้ว +7

    Thanks! the documentation did not go into enough depth and im glad someone made a comprehensive video on it

  • @shihlun5291
    @shihlun5291 2 ปีที่แล้ว +1

    Thanks for the tutorial, after watching it, it gave me a better understanding of scrapy itemloader documents.

  • @victormaia4192
    @victormaia4192 3 ปีที่แล้ว +6

    Great tutorial! very easy to follow, had no problems, about the typos, I'm the worst typer ever, but tabnine always saves my life.

  • @woldemarkiev
    @woldemarkiev 2 ปีที่แล้ว +2

    Great tutorial!! It really helps to understand

  • @RahulYadav-w1v4l
    @RahulYadav-w1v4l 17 วันที่ผ่านมา

    Amazing Tutorial 😍
    I do have a question - I am trying to get basic information from a shoe website and the spider is only returning half the items on the website because of the DUPEFILTER setting. Maybe there are same links for same shoe and different color or multiple items with the same link but if I try to change the filter setting, it goes into infinite loop. Is there a way around that?

  • @hendrikfeddersen6768
    @hendrikfeddersen6768 3 ปีที่แล้ว +3

    Thanks a lot. The videos are very clear. Do you mind explaining please in one of your next videos the correct folder structure of a Scrapy project and what file goes where and why.

  • @gwulfwud
    @gwulfwud 3 ปีที่แล้ว +1

    Thank you! I watched the previous video and then this, and it felt like I know so much about scrapy already. Really really good videos. Keep it up!

  • @amineboutaghou4714
    @amineboutaghou4714 3 ปีที่แล้ว +2

    Another great video ! Very well done John 👏🏼

  • @codewithnacho
    @codewithnacho 3 ปีที่แล้ว +3

    Awesome vid! It answered my questions with Item Loaders. Docs were confusing me haha

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +2

      I know! The docs are good but also, not so good haha

  • @carinafelnecan7802
    @carinafelnecan7802 2 ปีที่แล้ว +1

    Thank you, I learned a lot from this video:)

  • @justinames5439
    @justinames5439 2 ปีที่แล้ว +1

    As the others have said, thanks for your time and effort, a great help. The links connecting to Amazon (e.g. the lighting link) are dead, and you might want to update them. On another front, have you added a video on caching? All in all, really well done, and, again, thanks.
    jA

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      Thanks, one of the issues with a lot of the scrapers I wrote is that always age well! I haven’t actually done anything in caching yet no, I’ll add it to my list

  • @JnWayn
    @JnWayn 2 ปีที่แล้ว

    Nice to know what the competition is. I got a wisdom tooth. Is it possible with Scrapy to mark a checkbox, then click a button to get to the next page?

  • @mandela_byron
    @mandela_byron 3 ปีที่แล้ว +2

    Hello John, could you do a video on how to host the scrapy scripts

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Hi! Yes I've been wanting to cover this for a while, unfortunatley ScrapyD doesn;t work with the latest version of Scrapy, so the best alternative I could come up with was hosting the Spider on a Linux server and using a cronjob to run it every X hours. Would that be of interest?

    • @mandela_byron
      @mandela_byron 3 ปีที่แล้ว

      @@JohnWatsonRooney Sounds great. Looking forward to that. I've been having challenges as to how best to host my scraping scripts, I know there's some among us who also face the same challenge. Thanks, your efforts are much appreciated

  • @milank9857
    @milank9857 2 ปีที่แล้ว +1

    Great explanation as always, really helpful tutorial

  • @vidproli4231
    @vidproli4231 3 ปีที่แล้ว +1

    great tutorial, explain the exact thing I was looking for, thank you

  • @cosmicblack
    @cosmicblack 2 ปีที่แล้ว +1

    Great video. Thanks!!!

  • @nadyamoscow2461
    @nadyamoscow2461 3 ปีที่แล้ว +1

    Thanks a lot, what you do is amazing.

  • @user8ZAKC1X6KC
    @user8ZAKC1X6KC 2 ปีที่แล้ว

    I am having an issue where it seems like fetch(req) is the going a bit too fast, so it's only catching part of the page. Is there a way to slow it down? I can find it for when the crawler is working, but not for when you're scraping the shell. Thoughts?

  • @Scuurpro
    @Scuurpro 2 ปีที่แล้ว

    How would change a stock item in item loader. It only returns "In Stock" or " " when things are out of stock. Would I create a function with a value and if else statement?

  • @dcevansuk
    @dcevansuk 3 ปีที่แล้ว

    Another Excellent Video!!!
    I have one question; This is working with the parent URL data, is there a way to also use ItemLoader() with the associated child URL scraped data to end up with one combined yield l.load_item()?
    It could be an interesting video.

  • @yangvictor5349
    @yangvictor5349 ปีที่แล้ว +1

    thank you for sharing

  • @MohAmuza
    @MohAmuza 3 ปีที่แล้ว +2

    I scraped a product and some items don't have some data so the result is a nonetype which means None,
    I created in the items.py a function to check if it is None print something:
    def check_gift(value):
    if value is None:
    return "No gift"
    else:
    return value
    but it don't work where is the problem?

  • @Daviuliano
    @Daviuliano ปีที่แล้ว +1

    Super nice, however I am struggling to understand how would that work with a dynamic website where I am following a GET method which returns a data in json format. I do a bit of working around and convert it to a dictionary - but can’t seem to get it to return an item… any ideas that can help me?

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +1

      I think you'd still need to parse through the JSON and then load it into the item loader and item, it's been a while since I've done that though so not 100% sure sorry

    • @Daviuliano
      @Daviuliano ปีที่แล้ว +1

      @@JohnWatsonRooney thank you… I managed to do it now. Had to yield them all individually. But it’s working 👍🏼

    • @fatihkarakus6189
      @fatihkarakus6189 ปีที่แล้ว

      @@JohnWatsonRooney when i import items
      I get an error like this: attempted relative import with no known parent package
      how can i solve this error

  • @alessandr2
    @alessandr2 3 ปีที่แล้ว

    Thanks for the tutorial!! One question, what part of the new code prevents the error to appear if there is no price info?? Thanks in advance !!!

  • @thewheeldeal8439
    @thewheeldeal8439 3 ปีที่แล้ว

    This is a great video thanks!
    Question: Can scrapy save item objects to pickle binary files? If so, how? I just find it really convenient to save my scraped data into pickled objects that can be used quickly in other files, but I can't find any doc on that for scrapy...

  • @kevin_daang
    @kevin_daang 2 ปีที่แล้ว

    If i wanted to include when a whisky bottle was sold out, how would i do it with the item loader?

  • @sheikhakbar2067
    @sheikhakbar2067 3 ปีที่แล้ว +1

    Thanks a lot, that was very helpful.

  • @dokanplugincustomization1587
    @dokanplugincustomization1587 3 ปีที่แล้ว

    Awesome Playlist But i have one question ( products which are sold out they are not giving us any data in its price field i tried to place the alternative value something which you have done in previous vedio using try and except block ) But i failed to do so please guide me

  • @salimbo4577
    @salimbo4577 3 ปีที่แล้ว

    thank you so much. is there a way i can scrap audio data like sound data ?

  • @vitalij09
    @vitalij09 3 ปีที่แล้ว +1

    Thanks man!

  • @Abdul_Rafay_Pal
    @Abdul_Rafay_Pal ปีที่แล้ว +1

    what would you recommend? splash or playwright?

  • @leleemagnu6831
    @leleemagnu6831 3 ปีที่แล้ว +2

    John,
    Another great video.
    In the title the first word should read Scrapy or the video won't come up in a search.
    Let me wish you a, well deserved, fantastic Christmas !
    e

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Oh wow I didn’t notice! Thank you for pointing that out, I’ve changed it. Happy Christmas to you too!

  • @abukaium2106
    @abukaium2106 3 ปีที่แล้ว +2

    Great video. I wish a video of scrapy using proxy from you

  • @NatureLover02005
    @NatureLover02005 3 ปีที่แล้ว +1

    Excellent!!!

  • @ferilukmansyah3037
    @ferilukmansyah3037 3 ปีที่แล้ว +1

    thanks for best tutorial

  • @TheWhoIsTom
    @TheWhoIsTom 3 ปีที่แล้ว +1

    Nice tutorial!! Would be nice if you would show how to store the data of THIS code (item loader) into mongo DB. :)

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Thanks! Sure, I’m going to extend this project to cover more of Scrapy’s features, including pipelines and databases

    • @TheWhoIsTom
      @TheWhoIsTom 3 ปีที่แล้ว

      @@JohnWatsonRooney Awesome. Thanks a lot :)

  • @maheshsharma-zq2uc
    @maheshsharma-zq2uc 2 ปีที่แล้ว

    Can you make one project with scrappy to extract stocks information along with historical data

  • @fatihkarakus6189
    @fatihkarakus6189 ปีที่แล้ว

    when i import items
    I get an error like this: attempted relative import with no known parent package
    how can i solve this error

  • @alfakih7247
    @alfakih7247 ปีที่แล้ว

    More scrappy blog please

  • @karthikkarthik100
    @karthikkarthik100 ปีที่แล้ว

    Thanks for the informative video, Can't we just write if next_page: instead of if next_page is not None ?

  • @abdulcute
    @abdulcute 3 ปีที่แล้ว

    Best Vid for scrapy and best explanation @john Watson Rooney and others
    i have a one question along item loader that how we extract data if the element have more than one information (e.g. if element have two cell no then Item loader pick only first number not second one) as i learned from you previous vid we use getall()

  • @alexportugal3986
    @alexportugal3986 ปีที่แล้ว

    Hi, i just don't quite get why you use the itemloader part and all of that stuff when you can do it within the parse function. Seems to me that it gets more complicated to get the same result. Surely there is something I am missing

  • @KhalilYasser
    @KhalilYasser 3 ปีที่แล้ว +1

    Amazing tutorial. Thank you very much. Can you share the code as usual?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +2

      Yes, sure I've updated my repo here: github.com/jhnwr/whiskyspider

  • @GordonShamway1984
    @GordonShamway1984 3 ปีที่แล้ว +1

    Super

  • @ShahidulsPerspective
    @ShahidulsPerspective 2 ปีที่แล้ว

    How to save the URL of the extracted page when using itemloader.

  • @isabelsilva-wf8vg
    @isabelsilva-wf8vg 2 ปีที่แล้ว

    how do I use this on the xpath, I tried but it didnt work exactily like this {l.add_xpath( ' title ' , ' .//h1[@class="product__title"]')}