proxy_pool是Python爬虫代理IP池(proxy pool)
项目Github地址:
- docker
docker pull jhao104/proxy_pool docker run --env DB_CONN=redis://:password@ip:port/0 -p 5010:5010 jhao104/proxy_pool:latest
- Docker Compose
docker-compose up -d
- 使用
import requests def get_proxy(): return requests.get("http://127.0.0.1:5010/get/").json() def delete_proxy(proxy): requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy)) # your spider code def getHtml(): # .... retry_count = 5 proxy = get_proxy().get("proxy") while retry_count > 0: try: html = requests.get('http://www.example.com', proxies={"http": "http://{}".format(proxy)}) # 使用代理访问 return html except Exception: retry_count -= 1 # 删除代理池中代理 delete_proxy(proxy) return None