Raspberry Pi + Squid: Building a Proxy Server with your Raspberry Pi for Web-scraping

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 ต.ค. 2024
  • Discover the simplicity of setting up a proxy server on your Raspberry Pi using the user-friendly and open-source software known as Squid. In this tutorial, we provide a step-by-step guide, demonstrating its application for web scraping. However, the advantages of establishing a proxy server extend beyond this, encompassing enhanced security, efficient caching, accelerated networking requests, and streamlined connection management. Unlock the potential of your Raspberry Pi with this comprehensive tutorial on Squid proxy server setup!
    Link to Blog Post:
    shillehtek.com...
    Link to Post on Medium:
    / building-a-simple-prox...
    You can donate here:
    www.buymeacoff...
    Join this channel to get access to perks:
    / @mmshilleh

ความคิดเห็น • 14

  • @jorgevillarreal2245
    @jorgevillarreal2245 6 หลายเดือนก่อน +5

    But if the raspberry "proxy" is on the same local network isn't it using the same public ip address to do webscrapping ? Shouldn't this raspberry proxy be on a network with a different ISP provided public ip so that it can be leveraged correctly to avoid throttling ?

    • @mmshilleh
      @mmshilleh  6 หลายเดือนก่อน +2

      That's a great question! It's possible to still have some success with web scraping even when using a proxy on the same local network, especially if the websites being scraped aren't actively blocking or throttling requests based on IP addresses. However, for more reliable and efficient web scraping, it's generally recommended to use a proxy with a different public IP address. This helps to distribute requests effectively and reduces the likelihood of being blocked or throttled by websites. If you've had luck with your setup on your local network, it might be because the websites you're scraping haven't implemented strict IP-based blocking or throttling measures. Just keep in mind that as your scraping activities increase or as you target different websites, you may encounter limitations or restrictions.

  • @jaxjax7318
    @jaxjax7318 3 หลายเดือนก่อน +1

    I loved to use tail -f comand to the squid log. so I get the real time ip and connections the client was connecting to. it was very helpful to diagnose what IP's i need to block in the firewall

    • @mmshilleh
      @mmshilleh  3 หลายเดือนก่อน

      Thanks for the info!

  • @davidbotham7090
    @davidbotham7090 8 หลายเดือนก่อน +1

    Great tutorial! I would love to connect a few of these to a single RPI to monitor my filament containers. Do you know if that is possible? And, if so, maybe where there is a howto on something like that?

    • @mmshilleh
      @mmshilleh  8 หลายเดือนก่อน

      Hey man, no I do not know on the top of my head how to do that. You probably need more sensor apparatus than just a simple Pi. If you would like to discuss this in detail feel free to book a consulting slot on the buy me coffee link found on my TH-cam profile. Sounds involved yet interesting!

  • @tsriramaraju
    @tsriramaraju 3 หลายเดือนก่อน +1

    Great tutorial! I was wondering if we could use 3-4 4G LTE modems and rotate the IPs whenever there's a block. Do you have any suggestions on how to achieve this? Please provide some guidance.

    • @mmshilleh
      @mmshilleh  3 หลายเดือนก่อน

      Wow you definitely could sounds like an interesting project. You would have to have some error handling in your Python code I have done something similar. I do not think it is so complicated to do that

  • @jaredpullman1173
    @jaredpullman1173 4 หลายเดือนก่อน +1

    How can you adjust the squid.conf to allow remote use of the proxy? If I set the proxy I made as my browser proxy on my laptop it works great when I’m connected to the same wifi network but if I’m at a friends house on their wifi and try using the proxy as my browser proxy, it will not prompt for user authentication hence disallowing remote connection.

    • @mmshilleh
      @mmshilleh  4 หลายเดือนก่อน

      You can do these sorts of things easily with Tailscale. I recommend looking into that.

  • @MASKDANTE
    @MASKDANTE 2 หลายเดือนก่อน +1

    I have an internet connection that has a proxy and its IP is 192.168.49.1:8000, in order to connect to the internet I must configure this data, how do I configure the same on the raspberry pi4, I have not been able to use the internet via wifi, the raspberry pi4 connects to the wifi and assigns an IP automatically but does not browse because I have not configured this data as would be done when it is in client mode.

    • @mmshilleh
      @mmshilleh  2 หลายเดือนก่อน

      I am not sure my friend

    • @JamminJosh7
      @JamminJosh7 หลายเดือนก่อน +1

      Did you try manually setting a static IP in the router admin page?

    • @mmshilleh
      @mmshilleh  หลายเดือนก่อน

      @@JamminJosh7 I have done that before yes