0
我做了一个程序,取消了简单的维基百科页面。我把这本书'用python网页刮'。为什么这个简单的urlopen()函数会产生ssl错误?
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://en.wikipedia.org/wiki/Kevin_Bacon")
bsObj = BeautifulSoup(html, 'html.parser')
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'])
但它产生SSL错误:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1240, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1083, in request
self._send_request(method, url, body, headers)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1128, in _send_request
self.endheaders(body)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1079, in endheaders
self._send_output(message_body)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 911, in _send_output
self.send(msg)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 854, in send
self.connect()
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1237, in connect
server_hostname=server_hostname)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 376, in wrap_socket
_context=self)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 747, in __init__
self.do_handshake()
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 983, in do_handshake
self._sslobj.do_handshake()
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 628, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:646)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 6, in <module>
html = urlopen("http://en.wikipedia.org/wiki/Kevin_Bacon")
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 471, in open
response = meth(req, response)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 581, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 503, in error
result = self._call_chain(*args)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 443, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 686, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 465, in open
response = self._open(req, data)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 483, in _open
'_open', req)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 443, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1283, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1242, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:646)>
Python版本是3.5.0。
而书不处理这种错误。我必须使用pip安装与ssl相关的软件包吗?
请不要试图抓取维基百科页面。学习使用[API](https://www.mediawiki.org/wiki/API:Main_page)。当然,这里有一些学习曲线,但是您可以通过JSON获取所需的数据,而不必使用BeautifulSoup解析数据。 –
@ PM2Ring感谢您的建议,但我想知道为什么会发生此错误? – user3595632
我并不完全确定,因为我没有使用'urllib.request'(我现在使用第三方[Requests](http://docs.python-requests.org/en/latest/)模块进行HTTP东东)。但是'http:// en.wikipedia.org/wiki/Kevin_Bacon'重定向到'https:// en.wikipedia.org/wiki/Kevin_Bacon',并且需要SSL。谈判。 –