0
我该如何在lxml中做以下事情?日文字符搞砸lxml解析
runtime_text = node.xpath("//dl/dt[text()=u'Runtime:' or text()=u'Laufzeit:' or text()=u'再生時間:']/following-sibling::dd")[0].text.strip()
它工作正常,没有汉字,但一旦该行被加了进来,它失败:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 1498, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:52102)
File "xpath.pxi", line 295, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:151816)
File "apihelpers.pxi", line 1393, in lxml.etree._utf8 (src/lxml/lxml.etree.c:27087)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
'runtime_text = node.xpath(u“// dl/dt [text()='Runtime:'or text()='Laufzeit:'or text()='再生时间:']/following-sibling :: dd“)[0] .text.strip()'也许? lxml可能不理解* python *的Unicode字符 –
@AnthonySottile:鉴于'lxml'是用C语言编写的...是的,可能是:D – Amadan
@AnthonySottile感谢您的提示 - 该工程 – Hockey127