UnicodeDecodeError：'utf8'编解码器无法解码字节

我正在解析具有“iso-8859-15”编码的xml文件。UnicodeDecodeError：'utf8'编解码器无法解码字节

像'Zürich'，'Aktienrückk'这样的词被转换为“&＃228;”等

我尝试这些建议：

p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8')) 
>>> p.text 
u'found "\u62c9\u67cf \u591a\u516c \u56ed"' 
>>> print p.text

，但我得到这样UnicodeDecodeError: 'ascii' codec can't decode byte

错误，即使这不利于

content = unicode(mystring.strip(codecs.BOM_UTF8), 'utf-8')

我尝试了很多的建议，对堆栈溢出，但我无法弄清楚我的路。

我需要解析的内容写回以同样的字符集，如 'U'

来源

2013-08-27 rocx

你可以在任何python代码之前提供一个XML样本吗？ – badc0re

一个html文件，试试这个：

from xml.etree import ElementTree 
p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8')) 
print p.text.encode('utf8') 

found "拉柏 多公 园"

对于示例：

# -*- coding: utf-8 -*- 
from xml.etree import ElementTree 
text = 'Aktienrückk'.decode('utf8') 
print text.encode('utf8') 

Aktienrückk

唐忘记把# -*- coding: utf-8 -*-放在文件的开头。

来源

2013-08-27 14:40:47 badc0re

UnicodeDecodeError：'utf8'编解码器无法解码字节

回答

相关问题