从lxml解析html中的日期字符串

s = """ 
     <tbody> 
     <tr> 
     <td style="border-bottom: none"> 
     <span class="graytext" style="font-weight: bold;"> Reply #3 - </span> 
     <span class="graytext" style="font-size: 11px"> 
     05/13/09 2:02am 
     <br> 
     </span> 
     </td> 
    </tr> 
    </tbody> 
"""

在HTML字符串中，我需要取出日期字符串。从lxml解析html中的日期字符串

我试图以这种方式

import lxml 
    doc = lxml.html.fromstring(s) 
    doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]')

但是，这是行不通的。我应该只需要使用日期字符串。

来源

2012-06-14 Nava

您的查询选择span，你需要从中抓取文本：

>>> doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]') 
[<Element span at 1c9d4c8>]

大多数查询返回一个序列中，我通常使用一个辅助函数，得到的第一个项目。

然后：

>>> doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]') 
[<Element span at 1c9d4c8>] 
>>> doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]/text()') 
['\n 05/13/09 2:02am\n '] 
>>> first(doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]/text()'),'').strip() 
'05/13/09 2:02am'

来源

2012-06-14 13:45:08 MattH

请尝试以下，而不是最后一行：

print doc.xpath('//span[@class="graytext" and @style="font-size: 11px"]/text()')[0]

XPath表达式的第一部分是正确的，//span[@class="graytext" and @style="font-size: 11px"]选择所有匹配跨度的节点，然后你需要指定要从节点选择什么。这里使用的text()选择节点的内容。

来源

2012-06-14 13:44:23

从lxml解析html中的日期字符串

回答

相关问题