2016-05-13 63 views
0

这可能是一件非常简单的事情,但我一直在失败。使用lxml/XPath获得第n个元素失败

root包含一个或多个“<链接/ >”时,root.xpath('(// link)')将它们全部返回。但root.xpath('(// link)[0]')返回一个空列表。哪里不对?

from unittest import TestCase, TestProgram 

class T(TestCase): 
    base_path = r'(//_:link)' 
    def test0ok(self): 
     self._test(2, self.base_path) 
    def test1ng(self): 
     self._test(1, self.base_path + r'[0]') 
    def _test(self, expected, path): 
     try: 
      from lxml.etree import fromstring as parse_xml_string 
     except ImportError: 
      raise 
     root = parse_xml_string(_xhtml) 
     nsmap = dict(_=root.nsmap[None]) 
     gotten = root.xpath(path, namespaces=nsmap) 
     gotten = len(gotten) 
     self.assertEqual(expected, gotten) 

_xhtml = br''' 
<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html PUBLIC 
    "-//W3C//DTD XHTML 1.1//EN" 
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" 
> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> 
<head> 
<link rev="made" href="./" /> 
<link rel="contents" href="./" /> 
<title>te</title> 
</head> 
<body> 
<h1>st</h1> 
</body> 
</html> 
'''[1:] 

if __name__ == r'__main__': 
    TestProgram() 

回答

3

这是因为索引XPath中有1开始,而不是0:

root.xpath('(//link)[1]') 

或者,您也可以通过指数在Python(0基于)获得元素:

root.xpath('//link')[0]