2014-11-08 86 views
0

我试图从lxml使用xpath获取本网站的名人名单,但遇到了麻烦。使用lxml从html获取文本

下面是HTML

<div class="lists"> 
      <dl> <dt>A</dt> <dd><a href="/people/adam_levine/" id="20608779">Adam Levine</a> </dd> 

而且我想要得到的文本亚当·李维

我在Python代码...

celebs = tree.xpath('//dd[a]/following-sibling::node()') 

但我的结果元件DD在0x1084ad4c8> ...

如果任何人都可以提供帮助,那就太好了。由于

+0

尝试增加名人后打印(celebs.text)= tree.xpath() – knittledan 2014-11-09 19:43:30

回答

0

提取与text()文本,而不是following-sibling::node(),像这样:

from lxml import etree 

# your HTML is invalid, I have purposefully put the </dl> and </div> closing tags 
s = '''<div class="lists"> 
      <dl> <dt>A</dt> <dd><a href="/people/adam_levine/" id="20608779">Adam Levine</a> </dd></dl></div>''' 

tree = etree.fromstring(s) 

tree.xpath('.//dd/a/text()') 
['Adam Levine']