A Raspberry Pi as a decent residential proxy

栏目: IT技术 · 发布时间: 5年前

内容简介:One of our projects (A popular workaround to mitigate this problem is to pay for a proxy service to scrape these websites, sadly, we weren’t able to find a reliable provider that was within our small budget.Hence, we ended up building our own residential p

One of our projects ( cazadescuentos.net ) uses web-scrapping to scan several online stores to find discounts. Lately, we started supporting some stores that seem to block requests coming from common cloud proivdes (like AWS, DigitalOcean, etc), if you are curious, the websites are BestBuy and Costco Mexico .

A popular workaround to mitigate this problem is to pay for a proxy service to scrape these websites, sadly, we weren’t able to find a reliable provider that was within our small budget.

Hence, we ended up building our own residential proxy, right now being powered by an old Raspberry Pi model B, it’s worth adding that it wasn’t as simple as we expected, specially keeping the SSH tunnel available (more on this below).

A Raspberry Pi as a decent residential proxy

Try it

If you like to jump directly to the code or to play with it, we have open sourced the simple-http-proxy .

For the time being, I feel brave enough to even let you try the proxy without running it, hoping I won’t get a DoS, I’ll very likely remove the access once someone abuses my old Pi.

This command tells my Pi to query https://wiringbits.net by sending the custom DNT: 1 header:

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"url": "https://wiringbits.net", "headers": { "DNT": "1" }}' \
  "https://cazadescuentos.net/proxy"

How it works

The approach is actually very simple:

  • The Raspberry Pi runs a simple HTTP proxy.
  • The Pi is connected to the internet on a router exclusive for it.
  • As the Pi isn’t easily accesible from the internet, it opens a SSH tunnel to the server that will connect to the proxy served by the pi.
  • Our scrapper invokes the proxy as if it was running on localhost.

A Raspberry Pi as a decent residential proxy

About security

Security considerations:

  • Don’t expose the proxy to the world or attackers will be able to interact with your home devices.
  • Ideally, expose the proxy on an isolated network, different to the one where you connect your home devices.

Pitfalls

I ended up investing more time than expected tweaking the necessary stuff to keep the proxy working reliable, the biggest problem was related the SSH tunnel.

If you see the actual project , it includes a systemd service to keep the tunnel opened with the necessary tweaks.

The tunnel command being:

  • /usr/bin/ssh -nNT -R 9999:localhost:9000 -o ConnectTimeout=10 -o ExitOnForwardFailure=yes -o ServerAliveInterval=180 ubuntu@cazadescuentos.net

What matter the most:

ExitOnForwardFailure=yes
ServerAliveInterval=180

Future

It is very likely that if the proxy traffic increases considerably, it will get banned by some websites.

A more scalable approach could be to distribute these proxy devices into different locations, which prevents the SSH tunnel trick from being reasonable.

A possible approach is to use a queue service like AWS SQS/Kafka/etc to push the requests for scrapping a website while the proxy devices could be fighting to consume the next request, if one doesn’t complete the job, another one can try.

If you think about it, you don’t need a Raspberry Pi, you can even build a very simple Android app serving the same purpose.

In any case, this is how the proxy has been running for a couple of months, and I hope it stays like this for a while.

Found an error? We will appreciate if you submit a PR .


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

白帽子讲浏览器安全

白帽子讲浏览器安全

钱文祥 / 电子工业出版社 / 2016-3 / 79.00元

浏览器是重要的互联网入口,一旦受到漏洞攻击,将直接影响到用户的信息安全。作为攻击者有哪些攻击思路,作为用户有哪些应对手段?在《白帽子讲浏览器安全》中我们将给出解答,带你了解浏览器安全的方方面面。《白帽子讲浏览器安全》兼顾攻击者、研究者和使用者三个场景,对大部分攻击都提供了分析思路和防御方案。《白帽子讲浏览器安全》从攻击者常用技巧的“表象”深入介绍浏览器的具体实现方式,让你在知其然的情况下也知其所以......一起来看看 《白帽子讲浏览器安全》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具