2016-03-01 41 views
2

我使用下面的代码(来自retrieve links from web page using python and BeautifulSoup服用):BeautifulSoup不工作,得到NoneType错误

import httplib2 
from BeautifulSoup import BeautifulSoup, SoupStrainer 

http = httplib2.Http() 
status, response = http.request('http://www.nytimes.com') 

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')): 
    if link.has_attr('href'): 
     print link['href'] 

不过,我不明白为什么我收到以下错误信息:

Traceback (most recent call last): 
    File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module> 
    if link.has_attr('href'): 
TypeError: 'NoneType' object is not callable 

BeautifulSoup 3.2.0 的Python 2.7

编辑:

我尝试了类似的问题(Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable)提供的解决方案,但它给我以下错误:

Traceback (most recent call last): 
    File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module> 
    for link in BeautifulSoup(response).find_all('a', href=True): 
TypeError: 'NoneType' object is not callable 
+2

[类型的错误可能的复制如果link.has \ _attr('href'):TypeError:'NoneType'对象不可调用](http://stackoverflow.com/questions/19424009/type-error-if-link-has-attrhref-typeerror-nonetype -object-is-not-callabl) –

+0

@DavidZemens重复的问题尚未解决。请参阅该问题中的评论。 –

+0

重复的问题有一个可接受的答案,它可以识别*为什么*您会收到错误。考虑一些额外的调试,并根据需要使用'try/except' ... –

回答

5

首先:

from BeautifulSoup import BeautifulSoup, SoupStrainer

您正在使用BeautifulSoup version 3这是没有保持较长时间。切换到BeautifulSoup version 4。通过安装:

pip install beautifulsoup4 

,改变你的进口:

from bs4 import BeautifulSoup 

另外:

Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in if link.has_attr('href'): TypeError: 'NoneType' object is not callable

这里link是不具有has_attr方法Tag实例。这意味着,要记住什么是dot notation means in BeautifulSoup,它会尝试在link元素内搜索元素has_attr,这不会导致任何结果。换句话说,link.has_attrNone,明显是None('href')会导致错误。

相反,这样做:

soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True)) 
for link in soup.find_all("a", href=True): 
    print(link['href']) 

仅供参考,这里是我用来调试您的问题一个完整的工作代码(使用requests):

import requests 
from bs4 import BeautifulSoup, SoupStrainer 


response = requests.get('http://www.nytimes.com').content 
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True): 
    print(link['href']) 
相关问题