获取文本一个标签内外另一个

我解析与BeautifulSoup一个网页，它有像一些要素如下：获取文本一个标签内外另一个

<td><font size="2" color="#00009c"><b>Consultant Registration Number :</b></font> 16043646</td>

结构似乎总是一个<td>与包围的第一部分<font><b>，</font>标签后面的文本可以为空。我怎样才能得到字体标签后的文字？

在这个例子中，我想得到"16043646"。如果HTML是不是

<td><font size="2" color="#00009c"><b>Consultant Registration Number :</b></font></td>

我会想""

来源

2011-08-25 murgatroid99

>>> from BeautifulSoup import BeautifulSoup 
>>> text1 = '<td><font size="2" color="#00009c"><b>Consultant Registration Number :</b></font> 16043646</td>' 
>>> text2 = '<td><font size="2" color="#00009c"><b>Consultant Registration Number :</b></font></td>' 
>>> BeautifulSoup(text1).td.font.nextSibling 
u' 16043646' 
>>> BeautifulSoup(text2).td.font.nextSibling 
>>>

来源

2011-08-25 16:16:51

感谢。我在看文档的这一部分，但我没有意识到nextSibling在标签之外获得了文本。 – murgatroid99

获取文本一个标签内外另一个

回答

相关问题