2014-02-21 122 views
4

我写了这个代码从网页中提取所有文本:“NoneType”对象不是可调用beautifulsoup错误,而使用get_text

from BeautifulSoup import BeautifulSoup 
import urllib2 

soup = BeautifulSoup(urllib2.urlopen('http://www.pythonforbeginners.com').read()) 
print(soup.get_text()) 

问题是,我得到这个错误:

print(soup.get_text()) 
TypeError: 'NoneType' object is not callable 

任何关于如何解决这个问题的想法?

+1

把它一步步的时间......在'urllib2.urlopen( 'http://www.pythonforbeginners.com')'' –

回答

6

该方法被称为soup.getText(),即camelCased。

为什么你得到TypeError而不是AttributeError这对我来说是个谜!

+2

的getText()'的别名结果,先看看' get_text()'所以这不是它。 –

+1

这可能是版本特定的 - 我得到与v3.2.1的OP相同的错误,并且通过更改为getText()来修复它。 – wim

+0

嗯,很高兴知道。我有4倍,但是因为他使用'从BeautifulSoup'输入,我们可以肯定他不会!也许你是对的! –

0

正如Markku在评论中提出的建议,我会建议您打破代码。

from BeautifulSoup import BeautifulSoup 
import urllib2 

URL = "http://www.pythonforbeginners.com" 
page = urllib2.urlopen('http://www.pythonforbeginners.com') 
html = page.read() 
soup = BeautifulSoup(html) 
print(soup.get_text()) 

如果它仍然不起作用,请输入一些打印语句以查看发生了什么。

from BeautifulSoup import BeautifulSoup 
import urllib2 

URL = "http://www.pythonforbeginners.com" 
print("URL is {} and its type is {}".format(URL,type(URL))) 
page = urllib2.urlopen('http://www.pythonforbeginners.com') 
print("Page is {} and its type is {}".format(page,type(page)) 
html = page.read() 
print("html is {} and its type is {}".format(html,type(html)) 
soup = BeautifulSoup(html) 
print("soup is {} and its type is {}".format(soup,type(soup)) 
print(soup.get_text()) 
相关问题