Scrapy Basics - How to Get Started with Python's Web Scraping Framework

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024
  • Scrapy is a Python framework for web scraping and in this video I will show you the basics of how to start:
    * Create a scrapy project
    * Use the scrapy shell to find elements
    * How css selectors work with scrapy
    * Create a simple spider to crawl a website for product information
    Code: github.com/jhn...
    -------------------------------------
    twitter / jhnwr
    code editor code.visualstu...
    WSL2 (linux on windows) docs.microsoft...
    -------------------------------------
    Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
    mouse amzn.to/2SH1ssK
    27" monitor amzn.to/2GAH4r9
    24" monitor (vertical) amzn.to/3jIFamt
    dual monitor arm amzn.to/3lyFS6s
    microphone amzn.to/36TbaAW
    mic arm amzn.to/33NJI5v
    audio interface amzn.to/2FlnfU0
    keyboard amzn.to/2SKrjQA
    lights amzn.to/2GN7INg
    webcam amzn.to/2SJHopS
    camera amzn.to/3iVIJol
    gfx card amzn.to/2SKYraW
    ssd amzn.to/3lAjMAy

ความคิดเห็น • 88

  • @pythonantole9892
    @pythonantole9892 3 ปีที่แล้ว +5

    Oh my! This channel deserves more subscribers. I scrape a lot of tables in my job but never knew i could use pandas (had never it heard of it) until i saw one of your videos on Pandas. I look forward to more videos on scrapy now that i have the motivation to move away from BS4 and try scrapy.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Thanks for your kind words I’m glad it’s helped you!

  • @WildRover1964
    @WildRover1964 ปีที่แล้ว +1

    a useful start. Followed along and got this working myself (which doesn't often happen when following python tutorials on YT). |Looking forward to finding out now how to get the stuff from page two and then hopefully finding out how to follow links

  • @user8ZAKC1X6KC
    @user8ZAKC1X6KC 2 ปีที่แล้ว +3

    Something you note at the 9:23 mark is that you can close the space with a dot (or period). To add a little bit more to that. Regardless of the number of spaces, you only need one period. So close the gap completely and put one dot. I struggled with this for a while, as I had a custom class with 5 spaces (no idea why the coder would do that) in the name and it just never occurred to me that I could just use one dot. None of the documentation in scrapy indicated that. I spent quite a while trying figure that out.

    • @AmodeusR
      @AmodeusR ปีที่แล้ว +1

      It's good to learn about css if you're going to use css selectors. The space is closed with a dot because in CSS, when you want to select an element based on a shared class, you write it like "class1.class2". If you were to do "class1 class2" it would mean yout want to select an element that has class2 that is inside of an element that has the class1.
      To make it clear, we could think of real html elements: "p a" would select any link(a) inside a paragraph(p).

  • @sampatankar1977
    @sampatankar1977 3 ปีที่แล้ว +3

    Really lucid, well-judged in terms of content, and excellent videography. Timely too, given what I happen to be doing this week! Thankyou!

  • @celerystalk390
    @celerystalk390 3 ปีที่แล้ว +8

    Great job again John! I've never used Scrapy but now I feel it may be something really useful and powerful. It'd be great if you could do a video comparing the different scraping approaches you've introduced and their scenarios. Thx.

  • @nadyamoscow2461
    @nadyamoscow2461 3 ปีที่แล้ว +2

    The best scrapy basic tutorial I`ve seen. Thanks a lot!!

  • @engineerbaaniya4846
    @engineerbaaniya4846 3 ปีที่แล้ว +2

    Thank John please upload all videos for scrapy

  • @hardwaregenie
    @hardwaregenie 2 ปีที่แล้ว +1

    Thanks John for your tutorial. Really liked how easy and approachable you made it.

  • @ajayyadav-us8hd
    @ajayyadav-us8hd 3 ปีที่แล้ว +3

    Hey brother
    Thanx for the tutorials, can you make a tutorial on other files.
    eg:- middleware.py , items.py , settings.py
    And second thing how to use database in scrapy for reading & writing the data.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Yes will be doing videos on those too

    • @ajayyadav-us8hd
      @ajayyadav-us8hd 3 ปีที่แล้ว

      @@JohnWatsonRooney Thanks man

    • @greis790
      @greis790 3 ปีที่แล้ว

      @@JohnWatsonRooney An implementation of all the scenarios we use in requests like proxies user agents etc in scrapy framework would be awesome!! Nice tutorial as always!

  • @irfankalam509
    @irfankalam509 3 ปีที่แล้ว +3

    Nice one as always! Hope you would continue this as a series.

  • @mahdi132
    @mahdi132 ปีที่แล้ว +1

    Thank you very much your content is awesome

  • @theinstigatorr
    @theinstigatorr 3 ปีที่แล้ว +2

    Yay! It worked!

  • @MohAmuza
    @MohAmuza 3 ปีที่แล้ว +1

    I want to scrapy the product features but it doesn't work properly, I want to get the 4 or 5 features but I get 1 or all features of the page instead, no idea how it's behaving
    I used this code
    *response.css("div.f-grid.prod-row ul.f-list.j-list li::text").get()*
    the code above will print one feature
    *response.css("div.f-grid.prod-row ul.f-list.j-list li::text").getall()*
    the code above will print all features of the page while I want to print 4 or 5 depends on the product

  • @ALVINMAN452
    @ALVINMAN452 ปีที่แล้ว +1

    Thank you, very much.

  • @julz2020
    @julz2020 ปีที่แล้ว +1

    Dude I am loving your videos!! Opening up the wonderful; world of web scraping with these excellent Python tools. Thank you for the content ;]

  • @sinamobasheri3632
    @sinamobasheri3632 3 ปีที่แล้ว +2

    thanks and nice work John 👌🏻 i was waiting for this in long time 🙏🏻

  • @JohnMusicbr
    @JohnMusicbr 3 ปีที่แล้ว +1

    I'm a big fan of your work. Thanks, John.

  • @sagar318
    @sagar318 3 ปีที่แล้ว +1

    Man you're awesome! These videos are so informative and easy to understand, wish you all the success in this world

  • @edbull4891
    @edbull4891 2 ปีที่แล้ว +1

    Thank You for this fantastic training. Now I understand where scrapy is all about :)

  • @kavehmoradkhani8018
    @kavehmoradkhani8018 2 ปีที่แล้ว +1

    It tells the educational content very well
    You're Great.
    Thanks John!

  • @stephenwilson0386
    @stephenwilson0386 2 ปีที่แล้ว

    Great intro to Scrapy! Everywhere I've looked people say Scrapy is hard to learn, but frankly this seems more straightforward to me than BS. Maybe that's not the case when things get more complex, but that's just my two cents - maybe you're just better at explaining it?
    I'm trying to scrape products and prices from Newegg and running into a road bump - I can get the item name and such, but the price is nested in a tag inside a list and finally a div. Any tips on selecting that?

  • @martpagente7587
    @martpagente7587 3 ปีที่แล้ว +1

    Thankyou so much for this John, I hope this will become series.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Thanks Mart it will

    • @martpagente7587
      @martpagente7587 3 ปีที่แล้ว +1

      @@JohnWatsonRooney, I hope you can make video also Scrapy-splash approach for scraping dynamic websites by doing some project or sample under this series, thanks!

  • @susannegelarehamiri4497
    @susannegelarehamiri4497 3 ปีที่แล้ว +1

    Thanks John! Great video.

  • @kimodataworld5092
    @kimodataworld5092 2 ปีที่แล้ว

    thank you very much wiht your help i did my first web scraping

  • @litodemesa9699
    @litodemesa9699 2 ปีที่แล้ว +1

    You are one of the best!!

  • @Hugo-pw5ud
    @Hugo-pw5ud ปีที่แล้ว +1

    Thank you!! Almost there but the spider doesnt return the right output. What could be wrong? I do see the 200 scraped items via the shell. Am on Windows.

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว

      Did you check through the shell response for the items you are after? A 200 can also be something like a captcha page or a blocking page

    • @arpitakar3384
      @arpitakar3384 6 วันที่ผ่านมา

      ​@@JohnWatsonRooney YES it's only returning Menu tabs and down there services contact tabs

  • @daniel76900
    @daniel76900 3 ปีที่แล้ว +1

    as usual...great content...keep on the good work!

  • @chrissenanayake9891
    @chrissenanayake9891 2 ปีที่แล้ว +1

    Nice presentation!

  • @d.developer
    @d.developer 2 ปีที่แล้ว +1

    yessssss, i'm the 500 liked person!

  • @SecurityTalent
    @SecurityTalent 2 ปีที่แล้ว +1

    Great

  • @RenatoEsquarcit
    @RenatoEsquarcit 3 ปีที่แล้ว +1

    Appreciated your work!

  • @NXTTutorials
    @NXTTutorials 3 ปีที่แล้ว +1

    Thanks! Very useful!

  • @Modey3
    @Modey3 ปีที่แล้ว

    what is the reason for the venv? are you using a different version of python?

  • @eldarmammadov7872
    @eldarmammadov7872 ปีที่แล้ว +1

    could you make running scrapy from python script rather from shell

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว +1

      Yes you can run scrapy from a script I have a video on it see
      My channel

  • @victory9654
    @victory9654 3 ปีที่แล้ว +3

    Useful video, thanks! You're handsome too..

  • @Chryzsean420
    @Chryzsean420 3 ปีที่แล้ว +1

    Just subscribed, Thank you sir .

  • @pahehepaa4182
    @pahehepaa4182 3 ปีที่แล้ว

    How do I scrape links from level3 or level4 drop down menus and get output in tree format of all child nodes?

  • @stephennardone5437
    @stephennardone5437 3 ปีที่แล้ว

    I only recently found your channel but all in all great content! I am however coming across problems with POST requests and selenium is sadly not an option for my project.

  • @hardeepbhatti8619
    @hardeepbhatti8619 2 ปีที่แล้ว

    I really didn't understand the 11:33 part and how you do it btw am new to scrapy . Can you explain it?

  • @SabriCanOkyay
    @SabriCanOkyay 3 ปีที่แล้ว

    Thanks a lot for the video. I could scrape a website on my first try.
    I had a problem though. I get this error:
    raise ExpressionError(
    cssselect.xpath.ExpressionError: The pseudo-class :text is unknown ...
    When I changed 'a::text' into 'a::attr(href)' it worked. 'text' was also working in the shell but not in the py file. So, how can I get the texts in the file then?

  • @SunDevilThor
    @SunDevilThor 3 ปีที่แล้ว

    I’m loving these webscraping tutorials. I did get an error though as soon as I tried to use the products variable, such as products.css(‘h3’)
    I get the error: AttributeError: ‘str’ object has no attribute ‘css’

  • @igordc16
    @igordc16 2 ปีที่แล้ว +1

    Scrapy seems so intimidating.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      It is when you first look at it, but once you dive in and break it down into parts it will click

  • @mohamad5005
    @mohamad5005 2 ปีที่แล้ว

    Hi John
    how can I clear the screen while I am in scrapy shell ? (I use powershell)

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      Sure I think typing clear works?

    • @mohamad5005
      @mohamad5005 2 ปีที่แล้ว

      @@JohnWatsonRooney it works before i write the 'scrapy shell order',but after i enter in the response it doesn't work

  • @samcamus3000
    @samcamus3000 3 ปีที่แล้ว +1

    Can I use scrapy to scrape JavaScript generated content?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +2

      You can but you need to use the splash extension. I will be covering this soon when I release more scrapy content

    • @samcamus3000
      @samcamus3000 3 ปีที่แล้ว

      @@JohnWatsonRooney 👍👍👍

  • @prantokhan2303
    @prantokhan2303 3 ปีที่แล้ว +2

    scrapy shell 'URL'
    Doesn't work
    scrapy shell "URL"
    Double quote work

    • @MohAmuza
      @MohAmuza 3 ปีที่แล้ว

      I never use quotes

    • @athenacoding2384
      @athenacoding2384 2 ปีที่แล้ว +1

      Same for me. Thanks for this comment

  • @artabra1019
    @artabra1019 3 ปีที่แล้ว

    what is difference scrapy on beautifulsoup

  • @-__--__aaaa
    @-__--__aaaa 3 ปีที่แล้ว +1

    try with xpath pls

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Sure I’ll use xpath next time

    • @-__--__aaaa
      @-__--__aaaa 3 ปีที่แล้ว

      @@JohnWatsonRooney thanks ✅👍

  • @haithemamir223
    @haithemamir223 2 ปีที่แล้ว

    But how i can put this data in html

  • @Don_ron666
    @Don_ron666 2 ปีที่แล้ว

    Why does he use a virtual environment?

    • @nateTheNomad23
      @nateTheNomad23 ปีที่แล้ว

      Python scraping often involves the use of modules and packages. Once you have multiple python projects, if you don't use a virtual environment, you would have different projects using some of the same packages and modules. If you go to update a package for one project, you would break a different project relying on a previous version of the same package to work properly. A virtual environment isolates packages a modules associated with only one project, so that no matter what other projects use the same packages or modules, they don't interfere with each other. At least that's my understanding.

  • @muhammadhananasghar3102
    @muhammadhananasghar3102 3 ปีที่แล้ว

    Sir make a video on how to scrape google search results.

    • @-__--__aaaa
      @-__--__aaaa 3 ปีที่แล้ว

      you should pass useragent in headers

  • @angelesc2479
    @angelesc2479 3 ปีที่แล้ว +1

    After the command : scrapy shell 'jessops.com/drones'
    I got this as prompt : In [1] : instead of >>>
    I don't know what I've done wrong...

    • @angelesc2479
      @angelesc2479 3 ปีที่แล้ว +1

      Nevermind, it works fine anyway.
      Also found out the hard way that indentation matters !!

    • @MohAmuza
      @MohAmuza 3 ปีที่แล้ว

      it works without quotes

  • @mrindia4178
    @mrindia4178 3 ปีที่แล้ว +2

    Thank You!

    • @mrindia4178
      @mrindia4178 3 ปีที่แล้ว +1

      You are so down to earth, salute to you for providing this type of content for free