使用id=True
只匹配有设置该属性的元素:
soup.find_all('div', id=True)
逆工作过;你可以排除标签与id
属性:
soup.find_all('div', id=False):
要查找与标签给定的属性,你也可以使用CSS selectors:
soup.select('div[id]'):
但不支持搜索所需的运营商反过来,不幸的。
演示:
>>> from bs4 import BeautifulSoup
>>> sample = '''\
... <div id="id1">This has an id</div>
... <div>This has none</div>
... <div id="id2">This one has an id too</div>
... <div>But this one has no clue (or id)</div>
... '''
>>> soup = BeautifulSoup(sample)
>>> soup.find_all('div', id=True)
[<div id="id1">This has an id</div>, <div id="id2">This one has an id too</div>]
>>> soup.find_all('div', id=False)
[<div>This has none</div>, <div>But this one has no clue (or id)</div>]
>>> soup.select('div[id]')
[<div id="id1">This has an id</div>, <div id="id2">This one has an id too</div>]
官方文档对我来说很不错:http://www.crummy.com/software/BeautifulSoup/bs4/doc/ – CoDEmanX