ScrapeOps
ScrapeOps
  • 53
  • 276 846
How to Scrape G2 with Requests and BeautifulSoup
When it comes to online business, reputation is everything. Whether you're looking to buy something simple or a long commitment such as finding a new bank, you need a good understanding of anyone you decide to do business with. You can find a ton of different review sites online and G2 is one of the best.
In this article, we're going to scrape tons of important data from G2.
00:00 Intro
00:13 Understanding How To Scrape G2
01:14 Setting Up Our G2 Scraper Project
01:33 Build a G2 Search Crawler
01:34 Step 1: Create Simple Search Data Parser
05:49 Step 2: Add Pagination
06:21 Step 3: Storing the Scraped Data
11:41 Step 4: Adding Concurrency
12:32 Step 5: Bypassing Anti-Bots
13:28 Step 6: Production Run
13:38 Build a G2 Scraper
13:39 Step 1: Create Simple Business Data Parser
16:14 Step 2: Loading URLs To Scrape
17:00 Step 3: Storing the Scraped Data
21:38 Step 4: Adding Concurrency
22:10 Step 5: Bypassing Anti-Bots
22:15 Step 6: Production Run
Article With Code Examples: scrapeops.io/python-web-scraping-playbook/python-scrape-g2/
Python Web Scraping Playbook: scrapeops.io/python-web-scraping-playbook/
ScrapeOps Proxy Aggregator: scrapeops.io/proxy-aggregator/
มุมมอง: 320

วีดีโอ

How to Scrape Amazon With Python Requests and BeautifulSoup
มุมมอง 2.3K3 หลายเดือนก่อน
Amazon is the largest online retailer in the world and one of the largest overall retailers in the world. Whether you seek to track product prices, analyze customer reviews, or monitor competitors extracting information from Amazon can provide valuable insights and opportunities. In this guide, we'll take you through how to scrape Amazon using Python Requests and BeautifulSoup. 00:00 Intro 00:1...
Python Requests/BS4 Beginners Series Part 5: Using Fake User-Agents and Browser Headers
มุมมอง 2605 หลายเดือนก่อน
While scraping a couple hundred pages with your local machine is easy, websites will quickly block your requests when you need to scrape thousands or millions. In this guide, we're still going to look at how to use fake user-agents and browser headers so that you can apply these techniques if you ever need to scrape a more difficult website like Amazon. 00:00 Intro 00:45 Getting Blocked and Ban...
Python Requests/BS4 Beginners Series Part 4: Retries & Concurrency
มุมมอง 1655 หลายเดือนก่อน
In any web scraping project, the network delay acts as the initial bottleneck. Scraping requires sending numerous requests to a website and processing their responses. In Part 4, we'll explore how to make our scraper more robust and scalable by handling failed requests and using concurrency. 00:00 Intro 00:33 Understanding Scraper Performance Bottlenecks 01:03 Retry Requests and Concurrency Imp...
Python Requests/BS4 Beginners Series Part 3: Storing Data
มุมมอง 1346 หลายเดือนก่อน
There are many different ways we can store the data that we scrape from databases, CSV files to JSON format, and S3 buckets. In Part 3, we'll explore various methods for saving the data in formats suitable for common use cases. 00:00 Intro 00:44 Saving Data to a JSON File 03:18 Saving Data to Amazon S3 Storage 06:13 Saving Data to MySQL Database 09:47 Saving Data to Postgres Database Article Wi...
Python Requests/BS4 Beginners Series Part 2: Cleaning Dirty Data & Dealing With Edge Cases
มุมมอง 1906 หลายเดือนก่อน
Web data can be messy, unstructured, and have many edge cases. So, it's important that your scraper is robust and deals with messy data effectively. So, in Part 2: Cleaning Dirty Data & Dealing With Edge Cases, we're going to show you how to make your scraper more robust and reliable. 00:00 Intro 00:18 Strategies to Deal With Edge Cases 00:27 Structure your scraped data with Data Classes 05:09 ...
Python Requests/BS4 Beginners Series Part 1: How To Build Our First Scraper
มุมมอง 5276 หลายเดือนก่อน
When it comes to web scraping Python is the go-to language for web scraping because of its highly active community, great web scraping libraries and popularity within the data science community. To address this, we are doing a 6-Part Python Requests/BeautifulSoup Beginner Series, where we're going to build a Python scraping project end-to-end from building the scrapers to deploying on a server ...
Selenium Undetected Chromedriver: Bypass Anti-Bots With Ease
มุมมอง 7K7 หลายเดือนก่อน
In recent years, there has been a surge in the usage of sophisticated anti-bot headless browsers, prompting developers to fortify their browsers to hide revealing details and ensure their Selenium scrapers remain undetectable by anti-bot solutions. The Selenium Undetected ChromeDriver is an optimized version of the standard ChromeDriver designed to bypass the detection mechanisms of most anti-b...
Playwright Guide: Submitting Forms
มุมมอง 4687 หลายเดือนก่อน
Automating form submission is pivotal for web scraping and browser testing scenarios. Playwright provides flexible methods to interact with forms and input elements. We'll cover everything you need to know to master form submission with Playwright, from basic form interactions to handling dynamic inputs and form validation. So in this guide, we will go through: 00:00 Intro 00:28 Understanding H...
Python Selenium Guide: Using Fake User Agents
มุมมอง 7667 หลายเดือนก่อน
Staying undetected and mimicking real user behavior becomes paramount in web scraping. This is where the strategic use of fake user agents comes into play. So in this guide, we will go through: 00:00 Intro 00:29 What is a User-Agent? 00:43 What Are Fake User-Agents 01:29 How To Use Fake User-Agents In Selenium 03:25 Obtaining User Agent Strings 04:21 Troubleshooting and Best Practices Article W...
Puppeteer Guide: How To Take Screenshots
มุมมอง 2218 หลายเดือนก่อน
Taking screenshots is a fundamental aspect of web scraping and testing with Puppeteer. Screenshots not only serve as a valuable tool for debugging and analysis but also document the state of a webpage at a specific point in time. So in this guide, we will go through: 00:00 Intro 00:33 How To Take Screenshots With Puppeteer 02:26 How to Take Screenshot of the Full Page 04:35 How to Take Screensh...
The Python Selenium Guide - Web Scraping With Selenium
มุมมอง 1.2K11 หลายเดือนก่อน
Python Selenium is one of the best headless browser options for Python developers who have browser automation and web scraping use cases. Unlike many other scraping tools, Selenium can be used to simulate the human use of a webpage. Selenium makes it a breeze to accomplish some things that would be near impossible to do using another scraping package. In this guide, we will go through: 00:00 In...
The 5 Best NodeJs HTML Parsing Libraries Compared
มุมมอง 26711 หลายเดือนก่อน
When it comes to parsing HTML documents in NodeJs, there are a variety of libraries and tools available. Choosing the right HTML parser can make a big difference in terms of performance, ease of use, and flexibility. In this guide, we'll take a look at the top 5 HTML parsers for NodeJs and compare their features, strengths, and weaknesses including: 00:00 Intro 03:10 Cheerio 06:21 JSDOM 10:07 P...
Web Scraping Vs Web Crawling Explained
มุมมอง 2.9Kปีที่แล้ว
Sometimes people can use the terms Web Scraping and Web Crawling interchangably, however, they actually refer two different things. In this guide we will explain the differences between web scraping and web crawling, including giving your examples of both and how they are often used together. In this guide, we will go through: 00:00 Intro 01:07 What is Web Scraping? 02:44 What is Web Crawling? ...
The 5 Best Python HTML Parsing Libraries Compared
มุมมอง 534ปีที่แล้ว
When it comes to parsing HTML documents in Python, there are a variety of libraries and tools available. Choosing the right HTML parser can make a big difference in terms of performance, ease of use, and flexibility. In this video, we'll take a look at the top 5 HTML parsers for Python and compare their features, strengths, and weaknesses including: 00:00 Intro 00:49 5 Most Popular Python HTML ...
Axios: Retry Failed Requests
มุมมอง 742ปีที่แล้ว
Axios: Retry Failed Requests
Residential Proxies Explained: How You Can Scrape Without Getting Blocked
มุมมอง 1.4Kปีที่แล้ว
Residential Proxies Explained: How You Can Scrape Without Getting Blocked
Axios: Make Concurrent Requests
มุมมอง 340ปีที่แล้ว
Axios: Make Concurrent Requests
What Is Web Scraping? A Beginner's Guide On How To Get Started
มุมมอง 430ปีที่แล้ว
What Is Web Scraping? A Beginner's Guide On How To Get Started
Axios: Setting Fake User-Agents
มุมมอง 482ปีที่แล้ว
Axios: Setting Fake User-Agents
Axios: How to Send POST Requests
มุมมอง 302ปีที่แล้ว
Axios: How to Send POST Requests
Python Requests - Web Scraping Guide
มุมมอง 752ปีที่แล้ว
Python Requests - Web Scraping Guide
NodeJs Request-Promise: Using Fake User Agents
มุมมอง 452ปีที่แล้ว
NodeJs Request-Promise: Using Fake User Agents
Python Requests: Make Concurrent Requests
มุมมอง 1.2Kปีที่แล้ว
Python Requests: Make Concurrent Requests
NodeJs Request-Promise: How to Send POST Requests
มุมมอง 284ปีที่แล้ว
NodeJs Request-Promise: How to Send POST Requests
Python Requests: How To Retry Failed Requests
มุมมอง 1.5Kปีที่แล้ว
Python Requests: How To Retry Failed Requests
NodeJs Request-Promise: How to Use and Rotate Proxies
มุมมอง 1.9Kปีที่แล้ว
NodeJs Request-Promise: How to Use and Rotate Proxies
Python Requests: How To Send POST Requests
มุมมอง 1.9Kปีที่แล้ว
Python Requests: How To Send POST Requests
Python Requests: Using Fake User-Agents
มุมมอง 3.3Kปีที่แล้ว
Python Requests: Using Fake User-Agents
Python Requests: How To Use & Rotate Proxies
มุมมอง 3Kปีที่แล้ว
Python Requests: How To Use & Rotate Proxies

ความคิดเห็น

  • @Kattar_HINDU_hu253
    @Kattar_HINDU_hu253 19 วันที่ผ่านมา

    How can i scrape the contact info , if you can help please share me the code snippet

  • @jesusleguiza77
    @jesusleguiza77 20 วันที่ผ่านมา

    Hello, very good. What about products that have variants? How are different prices handled, whether by color or other different features? Regards

  • @jesusleguiza77
    @jesusleguiza77 20 วันที่ผ่านมา

    Hello, very good. What about products that have variants? How are different prices handled, whether by color or other different features?

  • @Maksilver
    @Maksilver 20 วันที่ผ่านมา

    Your channel is ridiculously underrated. You are very thorough, much appreciated. God Bless

  • @chiennguyennhu8153
    @chiennguyennhu8153 25 วันที่ผ่านมา

    Hey bro, can you make a video about collecting data from Shopee? It can be considered one of the most difficult websites

  • @AmonAsmodeus
    @AmonAsmodeus 2 หลายเดือนก่อน

    For some reason when I follow this video, I have no issues, but when I follow the article tutorial i get the errror: scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': No module named 'scrapy_playwright' deactivating the venv and reactivating it does not solve the issue for me.

  • @kineticraft6977
    @kineticraft6977 2 หลายเดือนก่อน

    So let’s say I’m running this in a docker container and there’s only command line. Do I still need to install chrome to get the chrome driver to work? Is there something else that needs to be done? And does it have to be Google chrome or can chromium be used with the chrome driver?

  • @ShivaniAre
    @ShivaniAre 2 หลายเดือนก่อน

    (venv) (ai) shivani.are@Apples-MacBook-Air basic-scrapy-project % scrapy crawl linkedin_people_profile -o profile.json 2024-09-04 15:06:25 [scrapy.utils.log] INFO: Scrapy 2.11.2 started (bot: basic_scrapy_spider) 2024-09-04 15:06:25 [scrapy.utils.log] INFO: Versions: lxml 5.3.0.0, libxml2 2.12.9, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.7.0, Python 3.12.2 (v3.12.2:6abddd9f6a, Feb 6 2024, 17:02:06) [Clang 13.0.0 (clang-1300.0.29.30)], pyOpenSSL 24.2.1 (OpenSSL 3.3.2 3 Sep 2024), cryptography 43.0.1, Platform macOS-12.7.3-x86_64-i386-64bit 2024-09-04 15:06:25 [scrapy.addons] INFO: Enabled addons: [] 2024-09-04 15:06:25 [py.warnings] WARNING: /Applications/XAMPP/xamppfiles/htdocs/electron-app/basicScrap/venv/lib/python3.12/site-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy. See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation. return cls(crawler) 2024-09-04 15:06:25 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor 2024-09-04 15:06:25 [scrapy.extensions.telnet] INFO: Telnet Password: bbb176fdb45022f2 2024-09-04 15:06:25 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2024-09-04 15:06:25 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'basic_scrapy_spider', 'CONCURRENT_REQUESTS': 1, 'NEWSPIDER_MODULE': 'basic_scrapy_spider.spiders', 'SPIDER_MODULES': ['basic_scrapy_spider.spiders']} 2024-09-04 15:06:26 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2024-09-04 15:06:26 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2024-09-04 15:06:26 [scrapy.middleware] INFO: Enabled item pipelines: [] 2024-09-04 15:06:26 [scrapy.core.engine] INFO: Spider opened 2024-09-04 15:06:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-09-04 15:06:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 ['reidhoffman'] 2024-09-04 15:06:26 [scrapy.core.engine] DEBUG: Crawled (999) <GET www.linkedin.com/in/reidhoffman/> (referer: None) 2024-09-04 15:06:26 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <999 www.linkedin.com/in/reidhoffman/>: HTTP status code is not handled or not allowed 2024-09-04 15:06:26 [scrapy.core.engine] INFO: Closing spider (finished) 2024-09-04 15:06:26 [scrapy.extensions.feedexport] INFO: Stored json feed (0 items) in: profile.json 2024-09-04 15:06:26 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 232, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 2444, 'downloader/response_count': 1, 'downloader/response_status_count/999': 1, 'elapsed_time_seconds': 0.459969, 'feedexport/success_count/FileFeedStorage': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2024, 9, 4, 9, 36, 26, 545549, tzinfo=datetime.timezone.utc), 'httperror/response_ignored_count': 1, 'httperror/response_ignored_status_count/999': 1, 'log_count/DEBUG': 2, 'log_count/INFO': 12, 'log_count/WARNING': 1, 'memusage/max': 61005824, 'memusage/startup': 61005824, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2024, 9, 4, 9, 36, 26, 85580, tzinfo=datetime.timezone.utc)} 2024-09-04 15:06:26 [scrapy.core.engine] INFO: Spider closed (finished) iam stuck this line how to resolve this issue

  • @TheDriftingStig
    @TheDriftingStig 2 หลายเดือนก่อน

    The amount of nesting burns my eyes, please separate it into functions at least 😭 Also for the item_data dictionary, you can write two for loops above it to generate a features dictionary and an images dictionary, and then update the item_data dictionary with those key value pairs.

  • @muhammadnasir5733
    @muhammadnasir5733 3 หลายเดือนก่อน

    i am stuck when scrapeops step 3. - Installing Authorized Keys & Creating ScrapeOps User

  • @ayush-that
    @ayush-that 3 หลายเดือนก่อน

    Ran without any errors but returned an empty list in the profile.json. Can anyone tell me what to do?

  • @log8746
    @log8746 3 หลายเดือนก่อน

    The Spider decides which URL to start with and sends the request to the Engine. The Engine passes the request to the Scheduler, which schedules it and sends it back to the Engine. The Engine then gives the request to the Downloader, which fetches the webpage and sends the response back to the Engine. The Engine sends the response to the Spider. The Spider scrapes the data and sends it, along with any new requests, back to the Engine. The Engine sends the data to the Item Pipelines for processing and the new requests to the Scheduler. This process keeps repeating.

  • @disrael2101
    @disrael2101 3 หลายเดือนก่อน

    perfect, very similar to the amazon scraper.. can you show how to find & save cookies from popular sites which require login e.g. insta, tiktok etc. as well as how to bypass their captcha

  • @disrael2101
    @disrael2101 3 หลายเดือนก่อน

    amazing, it works! keep making such insightful projects

  • @jamesosullivan751
    @jamesosullivan751 3 หลายเดือนก่อน

    This is other world coding I don't understand how humans have the ability to write code like this I'm gobsmacked

    • @EricDiaz-u1v
      @EricDiaz-u1v หลายเดือนก่อน

      chatgtp easy

  • @asdaasd-r4k
    @asdaasd-r4k 3 หลายเดือนก่อน

    bro what the heck, i thought i knew python, but after seeing this.. I think im going to learn another programming language. You are really good at this, keep going! 🥲🥲

  • @SalahSwailam
    @SalahSwailam 3 หลายเดือนก่อน

    Please is this course only for Mac users? What about Windows users?

  • @ernestoflores3873
    @ernestoflores3873 4 หลายเดือนก่อน

    Nice video, but i think it would improve with faster pace, and show the normal screen of the ide

  • @vicky87587
    @vicky87587 4 หลายเดือนก่อน

    great video, can you do how we can run this on aws lambda with ECR , scrapy-playwright

  • @nguyenhoanglong8805
    @nguyenhoanglong8805 4 หลายเดือนก่อน

    i'm curious how to start to build a project like this from scratch..pls give me some advise or instruction

  • @mohamedbassiony9322
    @mohamedbassiony9322 4 หลายเดือนก่อน

    I faced a probelm during scraping a website which is the class attribute values is very long and does not return any value like the following: <a aria-label tk label="Next" href="/s/Egypt/homes?refinement_paths%5B%5D=%2Fhomes&adults=2&tab_id class="l1ovpqvx atm_1he2i46_1k8pnbi_10saat9 atm_yxpdqi_1pv6nv4_10saat9 atm_1a0hdzc_w1h1e8_10saat9 atm_2bu6ew_929bqk_10saat9 atm_12oyo1u_73u7pn_10saat9 atm_fiaz40_1etamxe_10saat9 c1ytbx3a atm_mk_h2mmj6 atm_9s_1txwivl atm_h_1h6ojuz atm_fc_1h6ojuz atm_bb_idpfg4 atm_26_1j28jx2 atm_3f_glywfm atm_7l_hkljqm atm_gi_idpfg4 atm_l8_idpfg4 atm_uc_10d7vwn atm_kd_glywfm atm_gz_8tjzot atm_uc_glywfm__1rrf6b5 atm_26_zbnr2t_1rqz0hn_uv4tnr atm_tr_kv3y6q_csw3t1 atm_26_zbnr2t_1ul2smo atm_3f_glywfm_jo46a5 atm_l8_idpfg4_jo46a5 atm_gi_idpfg4_jo46a5 atm_3f_glywfm_1icshfk atm_kd_glywfm_19774hq atm_70_glywfm_1w3cfyq atm_uc_aaiy6o_9xuho3 atm_70_18bflhl_9xuho3 atm_26_zbnr2t_9xuho3 atm_uc_glywfm_9xuho3_1rrf6b5 atm_70_glywfm_pfnrn2_1oszvuo atm_uc_aaiy6o_1buez3b_1oszvuo atm_70_18bflhl_1buez3b_1oszvuo atm_26_zbnr2t_1buez3b_1oszvuo atm_uc_glywfm_1buez3b_1o31aam atm_7l_1wxwdr3_1o5j5ji atm_9j_13gfvf7_1o5j5ji atm_26_1j28jx2_154oz7f atm_92_1yyfdc7_vmtskl atm_9s_1ulexfb_vmtskl atm_mk_stnw88_vmtskl atm_tk_1ssbidh_vmtskl atm_fq_1ssbidh_vmtskl atm_tr_pryxvc_vmtskl atm_vy_1vi7ecw_vmtskl atm_e2_1vi7ecw_vmtskl atm_5j_1ssbidh_vmtskl atm_mk_h2mmj6_1ko0jae dir dir-ltr"> Is there a solution, please.

  • @Alexandru.M.P
    @Alexandru.M.P 4 หลายเดือนก่อน

    Question if I wanted to scrape all the profiles from linkedin for a specific country. Say Hungary. How would I go about doing that ?

  • @imascientistlol9035
    @imascientistlol9035 5 หลายเดือนก่อน

    doesnt work

  • @rafkabilly3375
    @rafkabilly3375 5 หลายเดือนก่อน

    I'm making a web clone and using a user agent

  • @PujanPanoramicPizzazz
    @PujanPanoramicPizzazz 5 หลายเดือนก่อน

    Seems the solution is out dated as jobs-guest filter does not work right now, it's voyager but more complicated and I cannot get that url.

  • @harrisonjameslondon
    @harrisonjameslondon 5 หลายเดือนก่อน

    Has anyone had issues with running the final scrapy list?? i am only getting the 'quotes' instead of linked_jobs !! please help as have just spent 4 hours on this!1

  • @Swatimishra-of9uv
    @Swatimishra-of9uv 5 หลายเดือนก่อน

    Can you record video of browser just like screenshots but in headless mode or in background?

  • @tabishshah992
    @tabishshah992 5 หลายเดือนก่อน

    import "scrapy" could not be resolvedPylance (sir import scrapy give me this whats the problem please help)

  • @VictorSalendu
    @VictorSalendu 5 หลายเดือนก่อน

    More videos to come?

  • @sashawon2015
    @sashawon2015 5 หลายเดือนก่อน

    Thank you

  • @tvcodemate
    @tvcodemate 5 หลายเดือนก่อน

    Bro, why don't I see ads when I open your videos? I'm going to make videos about scraping. Is this topic restricted? btw, you are great👍👍

  • @Praveshan0710
    @Praveshan0710 6 หลายเดือนก่อน

    Casually deleted their Sign Up no-scroll garbage.

  • @wilsonusman
    @wilsonusman 6 หลายเดือนก่อน

    Can you expand on the storage_queue? Is that just basically allowing only 5 products to be saved at a time?

    • @kedamendez3873
      @kedamendez3873 4 หลายเดือนก่อน

      you can modify it for save more think it as a block how much info do you want to save at once

  • @HmongCrypto
    @HmongCrypto 6 หลายเดือนก่อน

    I love python. It's pretty much a universal programming language. But the one thing I hate the most is that when it comes to some serious web stuff, you're a bit limited unlike JavaScript which is the default go to for anything web related. Anyone whom can master both python & javascript has a good set of skills right there to build from. Thanks for the video. Didn't know their was a python wrapper for this.

  • @osnium
    @osnium 6 หลายเดือนก่อน

    If I hear undetected chromedriver one more time im going mentally ill

  • @DigiSigns-ix9sb
    @DigiSigns-ix9sb 6 หลายเดือนก่อน

    I would like to scrape starbucks site for coffees and their prices

  • @isaacafedzi3368
    @isaacafedzi3368 6 หลายเดือนก่อน

    great. I followed the code along but in the log output, the [scrapy-user-agents.middleware] doesn't show. And also after adding the function which direct the url to a scrapeOps proxy, I end up getting empty output. but before I was to get the full content of the data I scraped. Please any help. I am using windows and for that matter shell . Thank you

  • @gracyfg
    @gracyfg 6 หลายเดือนก่อน

    if you say selenium scrapy is not reliable which one can we use with scrapy for javascript sites on windows. Playwright is also not reliable..

  • @ShellyHernandez-x
    @ShellyHernandez-x 7 หลายเดือนก่อน

    Impressive guide on scraping Amazon reviews!! Any suggestions for proxies that work well for this? I came across Proxy-Store on Google, they offer proxies for scraping, any feedback?

  • @gico0926
    @gico0926 7 หลายเดือนก่อน

    which python version is used in this venv?

  • @redsword7192
    @redsword7192 7 หลายเดือนก่อน

    You couldn't pass detection even after using API service.

  • @MDAbdurRahimcs50
    @MDAbdurRahimcs50 7 หลายเดือนก่อน

    Selenium Wire extension has been archived by the owner on Jan 3, 2024. It is now read-only. please show another way to connect proxy?

  • @Rodourmex
    @Rodourmex 7 หลายเดือนก่อน

    Thank you for your tutorial man, it was very helpful for me. Is there a way to retrieve information using the lua_script and storing that information to latter be used? For example a website that displays info in pages, I want to get the info of some elements in page one, but also in page two, so on. I'm guessing that maybe I can use a loop in the lua_script and then returning that information but I don't know anything about lua language. Thanks again for your tutorial, it was straightful and solved lot of doubts.

  • @slavivna
    @slavivna 7 หลายเดือนก่อน

    I need take screenshots of all images on page Img had attribute load lazy (Because I can't take src ) How I can take screenshots of all images? help me please 🥺

  • @alexanderscott2456
    @alexanderscott2456 7 หลายเดือนก่อน

    Can anyone explain to me the advantage of using itemloaders over just yielding a dict?

    • @log8746
      @log8746 3 หลายเดือนก่อน

      ItemLoaders are used to structure the data into the format that you want before passing it into the pipeline.

  • @MHawkinsx
    @MHawkinsx 7 หลายเดือนก่อน

    Sounds like a cool project! Thinking of trying it out, maybe with Proxy-Store's proxies for smoother scraping. Any Scrapy experts here?

    • @skullziegaming
      @skullziegaming 7 หลายเดือนก่อน

      nope, this code doesnt work

  • @disrael2101
    @disrael2101 8 หลายเดือนก่อน

    you didn't show to avoid detection

  • @tahirasaeed193
    @tahirasaeed193 8 หลายเดือนก่อน

    Make a video on how to setup scrapyops proxies in selenium scripts.

    • @disrael2101
      @disrael2101 8 หลายเดือนก่อน

      yes please

    • @Iammrgemini
      @Iammrgemini 7 หลายเดือนก่อน

      Hey man, I’m looking to connect with someone who’s into web scraping if we could share ideas and build projects together

  • @alexdin1565
    @alexdin1565 8 หลายเดือนก่อน

    can we calculate how its costs each request because I'm planing to create a saas app with points and can't know how calculate the price?

  • @mudassirmaqbool-ti9pi
    @mudassirmaqbool-ti9pi 8 หลายเดือนก่อน

    Hi there, I’m using this sample, in the response I’m not getting variants’ prices in variant_data(I'm getting variant name, ASIN, and variant price exists on amazon product page), how can I get?