2012-10-19 66 views
0

我正在使用此代码来获取网站HTML内容,Python 3中获取HTML内容

import urllib.request 
import lxml.html as lh 
req= urllib.request.Request("http://www.ip-adress.com/ip_tracer/157.123.22.11", 
headers={'User-Agent' : "Magic Browser"}) 
html = urllib.request.urlopen(req).read() 
doc = lh.fromstring(html) 
print (''.join(doc.xpath('.//*[@class="odd"]')[-1].text_content().split())) 

我想要得到的组织:天顶数据系统。 但它显示了一些错误

Traceback (most recent call last): 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 1135, in do_open 
h.request(req.get_method(), req.selector, req.data, headers) 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 967, in request 
self._send_request(method, url, body, headers) 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 1005, in _send_request 
self.endheaders(body) 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 963, in endheaders 
self._send_output(message_body) 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 808, in _send_output 
self.send(msg) 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 746, in send 
self.connect() 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 724, in connect 
self.timeout, self.source_address) 
File "/usr/local/python3.2.3/lib/python3.2/socket.py", line 404, in create_connection 
raise err 
File "/usr/local/python3.2.3/lib/python3.2/socket.py", line 395, in create_connection 
sock.connect(sa) 
socket.error: [Errno 111] Connection refused 

在处理上述异常,另一个异常:

Traceback (most recent call last): 
File "ext.py", line 4, in <module> 
html = urllib.request.urlopen(req).read() 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 138, in urlopen 
return opener.open(url, data, timeout) 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 369, in open 
response = self._open(req, data) 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 387, in _open 
'_open', req) 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 347, in _call_chain 
result = func(*args) 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 1155, in http_open 
return self.do_open(http.client.HTTPConnection, req) 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 1138, in do_open 
raise URLError(err) 
urllib.error.URLError: <urlopen error [Errno 111] Connection refused>} 

如何解决它。谢谢,

+1

'KeyboardInterrupt'意味着你按下'CTRL-C'和停止的过程。 – Blender

+0

@Blender:谢谢,我已经改变了错误 – AntiGMO

+1

您可以使用浏览器/或通过代理访问网站吗?你有防火墙吗?你的IP可能被禁止。 – jfs

回答

0

基本上,拒绝连接意味着只有注册用户被允许访问该页面或服务器在大量维护或类似的原因。

从你上面的代码,如果你要处理错误,你可以尝试使用try和除了像下面的代码:

try: 
    req= urllib.request.Request("http://www.ip-adress.com/ip_tracer/157.123.22.11",headers={'User-Agent' : "Magic Browser"}) 
    html = urllib.request.urlopen(req).read() 
    doc = lh.fromstring(html) 
    print (''.join(doc.xpath('.//*[@class="odd"]')[-1].text_content().split())) 
except urllib.error.URLError as e: 
    print(e.reason)