bytestr.decode（ 'UTF-8'），从文件返回的UnicodeDecodeError

读取的字节字符串：bytestr.decode（ 'UTF-8'），从文件返回的UnicodeDecodeError

>>> s = b'------WebKitFormBoundary02jEyE1fNXSRCL7D\r\nContent-Disposition: form-data; name="fileobj"; filename="3d15ef5126d4fa6631a863c29c5a741d.jpg"\r\nContent-Type: image/jpeg\r\n\r\n\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xe1\x006Exif\x00\x00II*' 
>>> s 
b'------WebKitFormBoundary02jEyE1fNXSRCL7D\r\nContent-Disposition: form-data; name="fileobj"; filename="3d15ef5126d4fa6631a863c29c5a741d.jpg"\r\nContent-Type: image/jpeg\r\n\r\n\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xe1\x006Exif\x00\x00II*' 
>>> print(s.decode('utf8')) 
Traceback (most recent call last): 
    File "<input>", line 1, in <module> 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 167: invalid start byte

为什么UnicodeDecodeError错误？ s.decode（'utf8'）必须返回str对象？

来源

2015-11-04 user1356067

的字节字符串包含除其他事项外的二进制图像。 'utf-8'是一种字符编码 - 它用于编码文本，而不是二进制数据，如图像。

一般来说，解析MIME数据，你可以使用email STDLIB包。

在你的情况下，足以找到头结束（空行），其余保存为图像：

import cgi 

headers, _, image = s.partition(b'\r\n\r\n') 
L = [cgi.parse_header(h)[1].get('filename') # parse headers, to get filename 
    for h in headers.decode('ascii', 'strict').splitlines()] 
filename = next(filter(None, L)) 
with open(filename, 'wb') as file: 
    file.write(image)

来源

2015-11-04 20:50:18 jfs

好的，谢谢！ s.partition（b'\ r \ n \ r \ n'） - 这就是我需要的） – user1356067

因为它是不恰当UTF-8字符串。 UTF-8字符不能从0xff开始。您可以使用errors标志来控制解码过程。阅读doc

是的，bytes.decode和bytearray.decode返回str对象。

来源

2015-11-04 20:17:00

bytestr.decode（ 'UTF-8'），从文件返回的UnicodeDecodeError

回答

相关问题