UnicodeDecodeError：'ascii'编解码器无法解码字节0xc2

我在Python中创建XML文件，并且在我的XML中有一个字段，我放置了文本文件的内容。我这样做UnicodeDecodeError：'ascii'编解码器无法解码字节0xc2

f = open ('myText.txt',"r") 
data = f.read() 
f.close() 

root = ET.Element("add") 
doc = ET.SubElement(root, "doc") 

field = ET.SubElement(doc, "field") 
field.set("name", "text") 
field.text = data 

tree = ET.ElementTree(root) 
tree.write("output.xml")

然后我得到了UnicodeDecodeError。我已经尝试将特别注释# -*- coding: utf-8 -*-放在我的脚本之上，但仍然出现错误。此外，我试图执行编码我的变量data.encode('utf-8')但仍然有错误。我知道这个问题非常普遍，但是我从其他问题中得到的所有解决方案都不适合我。

UPDATE

回溯：使用该脚本的第一行

Traceback (most recent call last): 
    File "D:\Python\lse\createxml.py", line 151, in <module> 
    tree.write("D:\\python\\lse\\xmls\\" + items[ctr][0] + ".xml") 
    File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write 
    serialize(write, self._root, encoding, qnames, namespaces) 
    File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml 
    _serialize_xml(write, e, encoding, qnames, None) 
    File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml 
    _serialize_xml(write, e, encoding, qnames, None) 
    File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml 
    write(_escape_cdata(text, encoding)) 
    File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata 
    return text.encode(encoding, "xmlcharrefreplace") 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 243: ordina 
l not in range(128)

回溯只有特殊的注释：使用.encode('utf-8')

Traceback (most recent call last): 
    File "D:\Python\lse\createxml.py", line 148, in <module> 
    field.text = data.encode('utf-8') 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 227: ordina 
l not in range(128)

我用.decode('utf-8')和错误信息没有出现，它成功地创建了我的XML文件。但问题是XML在我的浏览器上不可见。

来源

2013-05-12 kagat-kagat

查看整个错误消息以查看其来源将很有用。同时尝试使用'decode'而不是'encode'。 – 2013-05-12 14:49:30

已更新，当我使用'decode'时，它成功创建了我的XML，但该文件在我的浏览器中不可见。 – 2013-05-12 15:00:36

请注意，使用'＃ - * - coding：utf-8 - * - '仅用于在Python源代码中插入非ASCII字符。它不会以任何方式影响字符串的编码/解码。另外，如果文件'myText.txt'不是ASCII，则应该使用'codecs.open'并提供正确的编码：'codecs.open（'myText.txt'，'r'，'utf-8'）' 。 – Bakuriu 2013-05-12 15:17:55

在使用之前，您需要将输入字符串中的数据解码为unicode，以避免编码问题。

field.text = data.decode("utf8")

来源

2013-05-12 15:33:17 uhbif19

我在pywikipediabot中遇到类似的错误。该.decode方法是向正确方向迈出的一步，但对我来说没无添加'ignore'工作：因为ElementTree的，没想到发现非

fix_encoding = lambda s: s.decode('utf8', 'ignore')

来源

2013-12-25 03:32:48 guaka

+10

请注意，忽略编码错误将可能导致数据丢失，或产生不正确的输出。 – tripleee 2015-02-01 06:55:15

的Python 2

错误造成ASCII字符串在尝试写出时设置XML。您应该使用Unicode字符串代替非ASCII。可通过在字符串上使用u前缀（即u'€'）或通过使用适当编码对mystr.decode('utf-8')进行解码来创建Unicode字符串。

最佳做法是在读取所有文本数据时对其进行解码，而不是对程序进行解码。 io模块提供了一个open()方法，它在读取文本数据时将其解码为Unicode字符串。

ElementTree将会更加高兴Unicodes，并在使用ET.write()方法时正确编码它。

此外，为了获得最佳兼容性和可读性，请确保在write()期间ET编码为UTF-8并添加相关头文件。

意味着你的输入文件是UTF-8编码（0xC2是常见的UTF-8领先字节），把一切融合在一起，并使用with声明，你的代码应该是这样的：

with io.open('myText.txt', "r", encoding='utf-8') as f: 
    data = f.read() 

root = ET.Element("add") 
doc = ET.SubElement(root, "doc") 

field = ET.SubElement(doc, "field") 
field.set("name", "text") 
field.text = data 

tree = ET.ElementTree(root) 
tree.write("output.xml", encoding='utf-8', xml_declaration=True)

输出：

<?xml version='1.0' encoding='utf-8'?> 
<add><doc><field name="text">data€</field></doc></add>

来源

2016-05-07 12:03:29

#!/usr/bin/python

# encoding=utf8

尝试此操作以启动python文件

来源

2016-11-21 09:24:37

UnicodeDecodeError：'ascii'编解码器无法解码字节0xc2

回答

相关问题