Python lxml HTML xpath查询代码无法正常工作

我正在尝试使用下面的代码刮取页面。当我运行代码时，第一次分配给titles变量时出现错误。错误是：AttributeError：'NonType'对象没有'split'属性。Python lxml HTML xpath查询代码无法正常工作

如果我只是用print（tag.text）替换赋值，它按预期工作。第二个赋值给commmands变量也按预期工作。为什么第一个分配会产生错误？

代码：

import requests 
import lxml.html as LH 

s = requests.Session() 
r = s.get('http://www.rebootuser.com/?page_id=1721') 

root = LH.fromstring(r.text) 
def getTags(): 
    commands = [] 
    titles = [] 

    for tag in root.xpath('//*/tr/td[@width="54%"]/span'): 
     titles += tag.text.split(',') 

    for tag in root.xpath('//*/td/span/code'): 
     commands += tag.text.split(',') 

    zipped = zip(titles, commands) 

    for item in zipped: 
     print item 
getTags()

来源

2014-01-07 h33th3n

在该文件中，一些相匹配的XPath //*/tr/td[@width="54%"]/span该标签包含b标签作为孩子，而不是文字。

访问此类标签的文本属性返回None。

>>> None.split(',') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
AttributeError: 'NoneType' object has no attribute 'split'

使用text_content method，而不是text属性来获得正确的文本内容对于这种标签（及其子女）：

for tag in root.xpath('/tr/td[@width="54%"]/span'): 
    #titles += tag.text.split(',') 
    titles += tag.text_content().split(',')

来源

2014-01-07 15:34:05 falsetru

感谢很多人，这立即解决的问题！ – h33th3n

Python lxml HTML xpath查询代码无法正常工作

回答

相关问题