2014-05-07 46 views
2

想象一下,我有以下的html:BeautifulSoup:获取具有特定属性的元素,独立于它的价值

<div id='0'> 
    stuff here 
</div> 

<div id='1'> 
    stuff here 
</div> 

<div id='2'> 
    stuff here 
</div> 

<div id='3'> 
    stuff here 
</div> 

有没有一种简单的方法来提取具有属性id所有div的,独立的它的价值使用BeautifulSoup?我意识到用xpath来做这件事很简单,但似乎没有办法在BeautifulSoup中进行xpath搜索。

+0

官方文档对我来说很不错:http://www.crummy.com/software/BeautifulSoup/bs4/doc/ – CoDEmanX

回答

5

使用id=True只匹配有设置该属性的元素:

soup.find_all('div', id=True) 

逆工作过;你可以排除标签与id属性:

soup.find_all('div', id=False): 

要查找与标签给定的属性,你也可以使用CSS selectors

soup.select('div[id]'): 

但不支持搜索所需的运营商反过来,不幸的。

演示:

>>> from bs4 import BeautifulSoup 
>>> sample = '''\ 
... <div id="id1">This has an id</div> 
... <div>This has none</div> 
... <div id="id2">This one has an id too</div> 
... <div>But this one has no clue (or id)</div> 
... ''' 
>>> soup = BeautifulSoup(sample) 
>>> soup.find_all('div', id=True) 
[<div id="id1">This has an id</div>, <div id="id2">This one has an id too</div>] 
>>> soup.find_all('div', id=False) 
[<div>This has none</div>, <div>But this one has no clue (or id)</div>] 
>>> soup.select('div[id]') 
[<div id="id1">This has an id</div>, <div id="id2">This one has an id too</div>] 
+0

不错。有效。只要这样让我,我会标记你的答案被接受。 –

1

BeautifulSoup4支持commonly-used css selectors

>>> import bs4 
>>> 
>>> soup = bs4.BeautifulSoup(''' 
... <div id="0"> this </div> 
... <div> not this </div> 
... <div id="2"> this too </div> 
... ''') 
>>> soup.select('div[id]') 
[<div id="0"> this </div>, <div id="2"> this too </div>] 
相关问题