BeautifulSoup不工作，得到NoneType错误

我使用下面的代码（来自retrieve links from web page using python and BeautifulSoup服用）：BeautifulSoup不工作，得到NoneType错误

import httplib2 
from BeautifulSoup import BeautifulSoup, SoupStrainer 

http = httplib2.Http() 
status, response = http.request('http://www.nytimes.com') 

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')): 
    if link.has_attr('href'): 
     print link['href']

不过，我不明白为什么我收到以下错误信息：

Traceback (most recent call last): 
    File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module> 
    if link.has_attr('href'): 
TypeError: 'NoneType' object is not callable

BeautifulSoup 3.2.0 的Python 2.7

编辑：

我尝试了类似的问题（Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable）提供的解决方案，但它给我以下错误：

Traceback (most recent call last): 
    File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module> 
    for link in BeautifulSoup(response).find_all('a', href=True): 
TypeError: 'NoneType' object is not callable

来源

2016-03-01 John Rambo

[类型的错误可能的复制如果link.has \ _attr（'href'）：TypeError：'NoneType'对象不可调用]（http://stackoverflow.com/questions/19424009/type-error-if-link-has-attrhref-typeerror-nonetype -object-is-not-callabl） –

@DavidZemens重复的问题尚未解决。请参阅该问题中的评论。 –

重复的问题有一个可接受的答案，它可以识别*为什么*您会收到错误。考虑一些额外的调试，并根据需要使用'try/except' ... –

首先：

from BeautifulSoup import BeautifulSoup, SoupStrainer

您正在使用BeautifulSoup version 3这是没有保持较长时间。切换到BeautifulSoup version 4。通过安装：

pip install beautifulsoup4

，改变你的进口：

from bs4 import BeautifulSoup

另外：

Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in if link.has_attr('href'): TypeError: 'NoneType' object is not callable

这里link是不具有has_attr方法Tag实例。这意味着，要记住什么是dot notation means in BeautifulSoup，它会尝试在link元素内搜索元素has_attr，这不会导致任何结果。换句话说，link.has_attr是None，明显是None('href')会导致错误。

相反，这样做：

soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True)) 
for link in soup.find_all("a", href=True): 
    print(link['href'])

仅供参考，这里是我用来调试您的问题一个完整的工作代码（使用requests）：

import requests 
from bs4 import BeautifulSoup, SoupStrainer 


response = requests.get('http://www.nytimes.com').content 
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True): 
    print(link['href'])

来源

2016-03-01 21:48:34 alecxe

BeautifulSoup不工作，得到NoneType错误

回答

相关问题