Should I have used this Web Scraping Technique?

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 พ.ย. 2024

ความคิดเห็น • 19

  • @IJH-Music
    @IJH-Music 11 วันที่ผ่านมา +16

    In the age of low quality TH-cam videos, you shine bright!
    I hope you keep up the amazing high quality shenanigans (:

  • @899
    @899 9 วันที่ผ่านมา +2

    That’s funny John, you and I scrape the same way 😂. I use curl converter all of the time. Just scraped Lowe’s entire product omni & barcode info; that’s what brought me to see what you were up to. Legend!

  • @munchcup
    @munchcup 8 วันที่ผ่านมา +2

    In the on requests you can directly filter by the exact url that returns the json and take its headers(authorization) ie if data.request.url endswith() or use the full url.

  • @dominikhuemer2492
    @dominikhuemer2492 9 วันที่ผ่านมา +2

    so this is for learning purposes only! keep in mind guys :D damn i nearly laughed out so hard as you made your little remark about the purpose :D you definetly doing some great work mate and have helped me out a few times already :) much love from austria

  • @JB-fh1bb
    @JB-fh1bb 7 วันที่ผ่านมา

    "learning purposes only" isn't the only grey legality in this video:
    selenium-driverless has a non-commercial license and monetized/sponsored videos are commercial purposes

  • @royteicher
    @royteicher 8 วันที่ผ่านมา

    Hey John, A question -
    How about a usecase in which you want to scrape large scale of items - how'd you go about the auth headers? pass the same token to 100k requests, tries it until it fails then retrive another one? sending all requests in an asynchronized way?
    Is there a video that you've done about handling scraping in large volumes and build fault-tolerant pipelines for revrieving & parsing the data? Many potential errors in both aspects for a large volume and I'd love to learn your input on that.
    Keep up the good work!

  • @naradakandawala4278
    @naradakandawala4278 6 วันที่ผ่านมา +1

    Awesome as always ❤

  • @lordlegendsss7776
    @lordlegendsss7776 11 วันที่ผ่านมา +1

    Love from india you are best teacher for us

  • @alic690
    @alic690 10 วันที่ผ่านมา +1

    Good video John

  • @domitorid177
    @domitorid177 11 วันที่ผ่านมา +2

    Hi, John! Have you ever masked your scraper behind a known spider user agent, like googlebot?

  • @atulraaazzz2931
    @atulraaazzz2931 11 วันที่ผ่านมา +2

    Great video love from india

  • @AllenGodswill-im3op
    @AllenGodswill-im3op 11 วันที่ผ่านมา +1

    You’re the best.

  • @sonik121
    @sonik121 10 วันที่ผ่านมา +1

    Any details on what a proxy implementation looks like? What's actually in mobileproxyuk?

    • @JohnWatsonRooney
      @JohnWatsonRooney  10 วันที่ผ่านมา

      There’s recent video on my channel that shows how to use proxies with python

  • @0xissam
    @0xissam 9 วันที่ผ่านมา +1

    awesome 🔥🔥

  • @tomkmb4120
    @tomkmb4120 11 วันที่ผ่านมา

    Love it!

  • @Aidas_Li
    @Aidas_Li 11 วันที่ผ่านมา

    Nice video 👍 as always. Learned aaaa lot.
    Since I started watching your videos I have bought 2 MAC Studios and setting up to run 1M products daily.
    Very very complex for someone never programmed, could have not done without your guidance 24:29
    import asyncio
    from playwright.async_api import Page
    import random
    import json
    import logging
    from datetime import datetime
    from typing import List, Dict, Set, Optional
    import aiofiles
    import sys
    from collections import deque
    import signal
    from dataclasses import dataclass, field
    from scrapy.item import Field, Item
    from scrapy.exceptions import DropItem
    from aiohttp import ClientSession
    import os
    import time
    from playwright_extra import async_playwright_extra
    from playwright_extra.plugins.stealth import stealth_sync