虽然我已经写了作品的脚本,并不是所有的网站都有他们的标题返回(这是我去追求,获得网站的标题,并打印回来)。网站喜欢谷歌工作,但其他网站,如StackOverflow,则会产生错误。美丽的汤4不工作/一致
这里是我的代码:
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen("http://lxml.de"))
print soup.title.string
如果你能为我做这些事情,这将是巨大的:)
- 如果任何改进可以对代码进行(和处理变量)
- 如何解决它不返回的问题(并处理通用的任何错误)
- 该代码有效地返回一个USERWARNING(当它实际工作时)说我应该添加一个特殊的“html.parser”的剧本之后,但它没有工作后,我把在
BTW,错误文(正是因为它吐了出来):
Traceback (most recent call last):
File "C:\Users\NAME\Desktop\NETWORK\personal work\PROGRAMMING\Python\bibli
ography PYTHON\TEMP.py", line 5, in <module>
soup = BeautifulSoup(urllib2.urlopen("http://stackoverflow.com/questions/364
96222/beautiful-soup-4-not-working-consistent"))
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 437, in open
response = meth(req, response)
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 550, in http_resp
onse
'http', request, response, code, msg, hdrs)
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 475, in error
return self._call_chain(*args)
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 409, in _call_cha
in
result = func(*args)
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 558, in http_erro
r_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
Press any key to continue . . .
该错误似乎与您正在使用的urllib相关 – jithin