“UTF-8”编解码器不能解码字节0x80的

我试图下载BVLC训练模式，我坚持了这个错误“UTF-8”编解码器不能解码字节0x80的

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte

我想这是因为以下功能（complete code ）

# Closure-d function for checking SHA1. 
    def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']): 
     with open(filename, 'r') as f: 
      return hashlib.sha1(f.read()).hexdigest() == sha1

任何想法如何解决这个问题？

来源

2016-04-24 Ehab AlBadawy

错误消息是很清楚。您的文件根本不是UTF8，或者它已损坏。 – usr2564301

这就是当我尝试打印'f' '<_io.TextIOWrapper name ='models/bvlc_reference_caffenet/bvlc_reference_caffenet时得到的结果。caffemodel'mode ='r'encoding ='utf8'>' –

有趣。那么当你明确指定文件编码时会发生什么？像'open（filename，'r'，encoding ='utf8'）''？ –

您正在打开一个文件，是不是UTF-8编码，同时为您的系统默认的编码设置为UTF-8。

由于您正在计算SHA1哈希，您应该读取数据为二进制。该hashlib功能需要您以字节为单位传递：

with open(filename, 'rb') as f: 
    return hashlib.sha1(f.read()).hexdigest() == sha1

注意在文件模式添加b。

见open() documentation：

模式是一个可选的字符串，指定在其中打开文件的模式。它默认为'r'，这意味着可以在文本模式下阅读。在文本模式下，如果未指定编码，则所用编码与平台相关：调用locale.getpreferredencoding(False)以获取当前语言环境编码。（阅读和写作的原始字节使用二进制模式，并留下编码不确定的。）

，并从hashlib module documentation：

现在，您可以养活这个对象的字节状物体（通常是字节）使用update（）方法。

来源

2016-04-24 17:02:08

您没有指定以二进制模式打开文件，因此f.read()正试图将该文件读取为UTF-8编码的文本文件，该文件似乎不工作。但是由于我们采用了字节的散列值而不是字符串，所以编码是什么，甚至文件是否是文本都没有关系：只要打开它，然后将其作为二进制文件读取即可。

>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest()) 
Traceback (most recent call last): 
    File "<ipython-input-3-fdba09d5390b>", line 1, in <module> 
    with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest()) 
    File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode 
    (result, consumed) = self._buffer_decode(data, self.errors, final) 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte

但

>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest()) 
21bd89480061c80f347e34594e71c6943ca11325

来源

2016-04-24 17:01:24 DSM

感谢帝斯曼，'b'解决了它。 –

经过如此多的尝试，就是'b'。 – Deepank

由于没有在文档也不SRC码单提示，我不知道为什么，但使用B CHAR（我猜二进制）完全工作（TF-版本：1.1.0）：

image_data = tf.gfile.FastGFile(filename, 'rb').read()

For more information, check out: gfile

来源

2017-05-13 10:14:31 4F2E4A2E

“UTF-8”编解码器不能解码字节0x80的

回答

相关问题