玩蛇网提供最新Python编程技术信息以及Python资源下载!

python 爬虫爬wiki 报错 [Errno 65] No route to host

这个python问题的代码如下

# -*- coding: utf-8 -*-
import bs4
import re
import requests

from bs4 import BeautifulSoup

def work(html):
    soup = BeautifulSoup(html,'html.parser')
    print(soup.prettify())

use_data = {}
use_data['url'] = r'https://zh.wikipedia.org/zh/\%E9\%A2\%9C\%E8\%89\%B2\%E5\%88\%97\%E8\%A1\%A8'
proxy = {"http":"http://72.46.135.119:21071","https":"https://72.46.135.119:21071"} # shadowsocks服务器地址
# response = requests.get(use_data['url'])
response = requests.get(use_data['url'],proxies = proxy,verify=False)
print type(requests.get(use_data['url']).text) #查看编码
response.encoding = 'gbk'
work(response.text)

在不启用代理,注释proxy,执行response = requests.get(use_data['url']) 时报错

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='zh.wikipedia.org', port=443): Max retries exceeded with url: /zh/%5C%E9%5C%A2%5C%9C%5C%E8%5C%89%5C%B2%5C%E5%5C%88%5C%97%5C%E8%5C%A1%5C%A8 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x109c3e810>: Failed to establish a new connection: [Errno 65] No route to host',))

试着启用代理,使用的是自己买的shadowsocks服务器..结果报错无法连接代理。想问一下python爬虫可以用shadowsocks服务器作代理进行爬虫吗?如果不行,用什么方式代理爬wiki百科比较合适方便。谢谢

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='zh.wikipedia.org', port=443): Max retries exceeded with url: /zh/%5C%E9%5C%A2%5C%9C%5C%E8%5C%89%5C%B2%5C%E5%5C%88%5C%97%5C%E8%5C%A1%5C%A8 (Caused by ProxyError('Cannot connect to proxy.', error(54, 'Connection reset by peer')))

玩蛇网文章,转载请注明出处和文章网址:https://www.iplaypy.com/wenda/wd18026.html

相关文章 Recommend

玩蛇网Python互助QQ群,欢迎加入-->: 106381465 玩蛇网Python新手群
修订日期:2020年08月03日 - 14时26分42秒 发布自玩蛇网

您现在的位置: 玩蛇网首页 > Python问题解答 > 正文内容
我要分享到:

必知PYTHON教程 Must Know PYTHON Tutorials

必知PYTHON模块 Must Know PYTHON Modules