Beautifulsoup文本标记内

我想scrape一个页面下面的HTML结构：Beautifulsoup文本标记内

<li class="bookie-offer first" data-bookie-code="BB" data-customer-type="existing" data-sport-type="2">

有没有办法从李标签中提取数据？具体来说，我想提取数据客户类型和数据运动类型。

来源

2015-04-18 Louis Ryan

从doc：

标签可以具有任何数量的属性。标签<b class="boldest"> 具有其值“大胆”的属性“类”。

tag['class']

u'boldest'

您可以直接访问该字典作为.attrs：

tag.attrs

{u'class': u'boldest'}
您可以通过处理标签像一本字典访问标签的属性

你的情况...

>>> soup.find(class_='bookie-offer').attrs 

{'class': ['bookie-offer', 'first'], 
'data-bookie-code': 'BB', 
'data-customer-type': 'existing', 
'data-sport-type': '2'} 

>>> soup.find(class_='bookie-offer').attrs['data-customer-type'] 
'existing'

来源

2015-04-18 18:49:44 ComputerFellow

非常感谢！ –

Beautifulsoup文本标记内

回答

相关问题