日文字符搞砸lxml解析

我该如何在lxml中做以下事情？日文字符搞砸lxml解析

runtime_text = node.xpath("//dl/dt[text()=u'Runtime:' or text()=u'Laufzeit:' or text()=u'再生時間：']/following-sibling::dd")[0].text.strip()

它工作正常，没有汉字，但一旦该行被加了进来，它失败：

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "lxml.etree.pyx", line 1498, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:52102) 
    File "xpath.pxi", line 295, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:151816) 
    File "apihelpers.pxi", line 1393, in lxml.etree._utf8 (src/lxml/lxml.etree.c:27087) 
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

来源

2015-05-25 Hockey127

'runtime_text = node.xpath（u“// dl/dt [text（）='Runtime：'or text（）='Laufzeit：'or text（）='再生时间：']/following-sibling :: dd“）[0] .text.strip（）'也许？ lxml可能不理解* python *的Unicode字符 –

@AnthonySottile：鉴于'lxml'是用C语言编写的...是的，可能是：D – Amadan

@AnthonySottile感谢您的提示 - 该工程 – Hockey127

我想你想：

runtime_text = node.xpath(u"//dl/dt[text()='Runtime:' or text()='Laufzeit:' or text()='再生時間：']/following-sibling::dd")[0].text.strip()

LXML大概不明白python的unicode文字

来源

2015-05-25 01:57:53

'xpath（）'方法支持XPath语法中的表达式。而xpath语法与python是分开的。 http://en.wikipedia.org/wiki/XPath – monkut

日文字符搞砸lxml解析

回答

相关问题