我想用lxml处理一些数据。它工作正常,我的开发服务器上,但对生产下列代码:生产lxml编码错误
parser = etree.XMLParser(encoding='cp1251')
抛出:
File "parser.pxi", line 1288, in lxml.etree.XMLParser.__init__ (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:77726)
File "parser.pxi", line 738, in lxml.etree._BaseParser.__init__ (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:73404)
LookupError: unknown encoding: 'cp1251'
我使用LXML 2.3。 GAE似乎支持相同的版本。那么为什么这个错误呢?
编辑:
我指定不同的编码,以XMLParser
,如CP1252,ISO-8859-5,ISO-8859-2,它总是扔在GAE相同的错误,但我工作的本地机器上。这些都是流行的编码,而且GAE上的lxml必须支持它们。我相信这在GAE上的lxml构建中是错误的。
我创建了一个问题:http://code.google.com/p/googleappengine/issues/detail?id=7315
EDIT2:
完全回溯:
unknown encoding: 'cp1251'
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~my_cool_app_id/1.358126884781269352/main.py", line 29, in get
parser = etree.XMLParser(encoding='cp1251')
File "parser.pxi", line 1288, in lxml.etree.XMLParser.__init__ (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:77726)
File "parser.pxi", line 738, in lxml.etree._BaseParser.__init__ (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:73404)
LookupError: unknown encoding: 'cp1251'
这适用于我在shell-27.appspot.com(一旦我注意不要让解析器被酸洗)。你使用Python 2.7吗?你可以包含完整的堆栈跟踪吗? – 2012-04-11 04:19:07
另外,你有没有尝试过之前解码文本传递给XML解析器? – 2012-04-11 04:19:39
@NickJohnson感谢您的意见!是的,我正在使用Python 2.7。在shell-27.appspot.com我甚至不能导入lxml(没有名为lxml的模块)。我能够用'my_xml_str.decode('cp1251')'解码文本,但这不是最理想的,因为在将这样的unicode字符串传递给'etree.XML()'进行解析之前,我必须手动删除xml中的编码声明证明自己。其次,xml文档的大小为5Mb,这比允许lxml自行解码字符串要慢。我编辑了我的问题,包括完整的追溯。 – Maxim 2012-04-11 06:47:55