2013-04-05 38 views
17

尝试做以下...XML Unicode字符串与编码声明不支持

from lxml import etree 
from lxml.etree import fromstring 

if request.POST: 
    parser = etree.XMLParser(ns_clean=True, recover=True) 
    h = fromstring(request.POST['xml'], parser=parser) 
    return HttpResponse(h.cssselect('itagg_delivery_receipt status').text_content()) 

,但它给这个错误:

[Fri Apr 05 10:27:54 2013] [error] Internal Server Error: /sms/status_postback/ 
[Fri Apr 05 10:27:54 2013] [error] Traceback (most recent call last): 
[Fri Apr 05 10:27:54 2013] [error] File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response 
[Fri Apr 05 10:27:54 2013] [error]  response = callback(request, *callback_args, **callback_kwargs) 
[Fri Apr 05 10:27:54 2013] [error] File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 77, in wrapped_view 
[Fri Apr 05 10:27:54 2013] [error]  return view_func(*args, **kwargs) 
[Fri Apr 05 10:27:54 2013] [error] File "/srv/project/livewireSMS/sms/views.py", line 42, in update_delivery_status 
[Fri Apr 05 10:27:54 2013] [error]  h = fromstring(request.POST['xml'], parser=parser) 
[Fri Apr 05 10:27:54 2013] [error] File "lxml.etree.pyx", line 2754, in lxml.etree.fromstring (src/lxml/lxml.etree.c:54631) 
[Fri Apr 05 10:27:54 2013] [error] File "parser.pxi", line 1569, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82659) 
[Fri Apr 05 10:27:54 2013] [error] ValueError: Unicode strings with encoding declaration are not supported. 

这是XML

<?xml version="1.1" encoding="ISO-8859-1"?> 
<itagg_delivery_receipt> 
<version>1.0</version> 
<msisdn>447889000000</msisdn> 
<submission_ref> 
845tgrgsehg394g3hdfhhh56445y7ts6</ 
submission_ref> 
<status>Delivered</status> 
<reason>4</reason> 
<timestamp>20050709120945</timestamp> 
<retry>0</retry> 
</itagg_delivery_receipt> 

我无法控制来自SMS公司的xml文档。

+0

从lxml的常见问题解答:[*为什么lxml不能从unicode字符串解析我的XML?*](http://lxml.de/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings) – 2018-02-13 17:06:07

+0

可能的重复[解析XML文件得到UnicodeEncodeError(ElementTree)/ ValueError(lxml)](https://stackoverflow.com/questions/15622027/parsing-xml-file -gets-unicodeencodeerror-elementtree-valueerror-lxml) – 2018-02-13 17:14:54

回答

28

你必须对它进行编码,然后强制相同的编码在解析器:

from lxml import etree 
from lxml.etree import fromstring 

if request.POST: 
    xml = request.POST['xml'].encode('utf-8') 
    parser = etree.XMLParser(ns_clean=True, recover=True, encoding='utf-8') 
    h = fromstring(xml, parser=parser) 

    return HttpResponse(h.cssselect('delivery_reciept status').text_content()) 
+0

可以在代码中定义XML数据的地方提供独立的示例吗? – 2018-02-13 16:14:39

7

kernc以下解决方案为我工作:

>>> from lxml import etree 
>>> xml = u'<?xml version="1.0" encoding="utf-8" ?><foo><bar/></foo>' 
>>> xml = bytes(bytearray(xml, encoding='utf-8')) # ADDENDUM OF THIS LINE (when unicode means utf-8, e.g. on Linux) 
>>> etree.XML(xml) 
<Element html at 0x5b44c90> 
+0

有趣。即使在阅读建议解决方法的[page](https://gist.github.com/karlcow/3258330#gistcomment-769151)后,我仍然想念为什么解码失败...:S – Pintun 2016-11-04 16:47:00