所以,用于测试目的,让我们假设该段HTML是span
标签中:
x = """<span><br />
Important Text 1
<br />
<br />
Not Important Text
<br />
Important Text 2
<br />
Important Text 3
<br />
<br />
Non Important Text
<br />
Important Text 4
<br /></span>"""
现在我要分析它,并找到我的跨度标签:
from BeautifulSoup import BeautifulSoup
y = soup.find('span')
如果您遍历在y.childGenerator()
发电机,你会得到br和文本:
In [4]: for a in y.childGenerator(): print type(a), str(a)
....:
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Important Text 1
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Not Important Text
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Important Text 2
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Important Text 3
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Non Important Text
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Important Text 4
<type 'instance'> <br />
啊,问题是我是用findNextSibling(),以及刚跳过文本并进入下一个换行符。使用nextSibling工作。谢谢您的帮助! – maltman 2011-03-14 15:22:29
很好的回答,这让我很头疼! – Nick 2013-07-24 01:58:41
'next'不是Python中的保留字吗?也许不同的变量名会更好? (这是一个小点,但这样的东西加起来!) – duhaime 2013-10-18 02:20:50