Beautifulsoup搜索在ATTRS

获得源代码后的关键字，我有Beautifulsoup搜索在ATTRS

[<div amy="sister" tommy="brother" julie="link1">E11</div>] 
[<div amy="sister" tommy="brother" julie="link2_cat">E12</div>] 
[<div amy="sister" tommy="brother" julie="link3_cat">E13</div>]

我想提取的那些包含“_cat”朱莉。我怎么用find_all（attr）来做到这一点？

我尝试

soup.find_all('div',{"julie":re.compile("_cat")})

但不工作

来源

2017-02-03 Jimmy Lee

import bs4 

html = '''<div amy="sister" tommy="brother" julie="link1">E11</div> 
<div amy="sister" tommy="brother" julie="link2_cat">E12</div> 
<div amy="sister" tommy="brother" julie="link3_cat">E13</div>''' 
soup = bs4.BeautifulSoup(html, 'lxml') 

soup.find_all('div',{"julie":re.compile("_cat")})

出来：

[<div amy="sister" julie="link2_cat" tommy="brother">E12</div>, 
<div amy="sister" julie="link3_cat" tommy="brother">E13</div>]

你应该soup对象使用find_all()，而不是在标签的列表。

来源

2017-02-03 16:21:46

如果你的意思去的julie标签的属性值，treat each matched tag as a dictionary：

In [5]: [tag["julie"] for tag in soup.find_all('div',{"julie":re.compile("_cat")})] 
Out[5]: ['link2_cat', 'link3_cat']

还有一种更简洁的方式来匹配所需的元素 - CSS selectors：

In [6]: [tag["julie"] for tag in soup.select('div[julie$=_cat]')] 
Out[6]: ['link2_cat', 'link3_cat']

$=选择装置“以。。结束”。

来源

2017-02-03 16:22:41 alecxe

Beautifulsoup搜索在ATTRS

回答

相关问题