2009-05-20 204 views
4

我想传递的utidy到美丽的汤,结果,鼻翼:美丽的汤和uTidy

page = urllib2.urlopen(url) 
options = dict(output_xhtml=1,add_xml_decl=0,indent=1,tidy_mark=0) 
cleaned_html = tidy.parseString(page.read(), **options) 
soup = BeautifulSoup(cleaned_html) 

运行时,下面的错误结果:

Traceback (most recent call last): 
    File "soup.py", line 34, in <module> 
    soup = BeautifulSoup(cleaned_html) 
    File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1499, in __init__ 
    BeautifulStoneSoup.__init__(self, *args, **kwargs) 
    File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1230, in __init__ 
    self._feed(isHTML=isHTML) 
    File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1245, in _feed 
    smartQuotesTo=self.smartQuotesTo, isHTML=isHTML) 
    File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1751, in __init__ 
    self._detectEncoding(markup, isHTML) 
    File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1899, in _detectEncoding 
    xml_encoding_match = re.compile(xml_encoding_re).match(xml_data) 
TypeError: expected string or buffer 

我收集utidy返回的XML文档而BeautifulSoup需要一个字符串。有没有一种方法可以转换clean_html?或者我做错了,应该采取不同的方法?

回答

11

只是将str()换成cleaned_html 才会传递给BeautifulSoup。

2

将传递给BeautifulSoup的值转换为字符串。 在你的情况下,做以下编辑到最后一行:

soup = BeautifulSoup(str(cleaned_html))