0
我有一些HTML看起来像这样:使用BeautifulSoup解析<tr>标签,有麻烦提取值
<tr>
<td>some text</td>
<td>some other text</td>
<td>some <b>problematic</b> other <br /> text</td>
</tr>
和一些Python它试图抓住标签的值并打印出每个内在价值:
soup = BeautifulSoup(data, convertEntities=BeautifulSoup.HTML_ENTITIES)
for row in soup.findAll('tr'):
print repr(row) # this prints the whole 'tr' element text just fine.
for col in row.contents:
print col.string
所以全文正确打印拍摄的HTML,但“关口”打印无最后一个元素:
some text
some other text
None
我并不熟悉BeatifulSoup或python,但它似乎是最后一个元素的内部标签导致解析问题?
感谢