2014-09-29 148 views
2

使用datasets.fetch_mldata()时,我从进口sklearn.datasets fetch_mldata 导入fetch_mldata 并呼吁:IO错误sklearn

dataset = fetch_mldata('MNIST original') 

但我得到的是以下几点:

> Traceback (most recent call last): File "<stdin>", line 1, in 
> <module> File 
> "C:\Users\Jacob\Development\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", 
> line 540, in runfile 
>  execfile(filename, namespace) File "C:/Users/Jacob/Documents/Dropbox/Technion/Semester 8/Machine 
> learning/Demo3/Demo3.py", line 75, in <module> 
>  dataset = fetch_mldata('MNIST original') File "C:\Users\Jacob\Development\Anaconda\lib\site-packages\sklearn\datasets\mldata.py", 
> line 158, in fetch_mldata 
>  matlab_dict = io.loadmat(matlab_file, struct_as_record=True) File 
> "C:\Users\Jacob\Development\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", 
> line 126, in loadmat 
>  matfile_dict = MR.get_variables(variable_names) File "C:\Users\Jacob\Development\Anaconda\lib\site-packages\scipy\io\matlab\mio5.py", 
> line 288, in get_variables 
>  res = self.read_var_array(hdr, process) File "C:\Users\Jacob\Development\Anaconda\lib\site-packages\scipy\io\matlab\mio5.py", 
> line 248, in read_var_array 
>  return self._matrix_reader.array_from_header(header, process) File "mio5_utils.pyx", line 616, in 
> scipy.io.matlab.mio5_utils.VarReader5.array_from_header 
> (scipy\io\matlab\mio5_utils.c:5903) File "mio5_utils.pyx", line 645, 
> in scipy.io.matlab.mio5_utils.VarReader5.array_from_header 
> (scipy\io\matlab\mio5_utils.c:5332) File "mio5_utils.pyx", line 713, 
> in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex 
> (scipy\io\matlab\mio5_utils.c:6323) File "mio5_utils.pyx", line 417, 
> in scipy.io.matlab.mio5_utils.VarReader5.read_numeric 
> (scipy\io\matlab\mio5_utils.c:3873) File "mio5_utils.pyx", line 353, 
> in scipy.io.matlab.mio5_utils.VarReader5.read_element 
> (scipy\io\matlab\mio5_utils.c:3595) File "streams.pyx", line 324, in 
> scipy.io.matlab.streams.FileStream.read_string 
> (scipy\io\matlab\streams.c:4343) IOError: could not read bytes 

我尝试下载更新版本的sklearn,但它没有帮助。 我是另一个关于这个问题的线索,但提供的解决方案并没有帮助我。 How to use datasets.fetch_mldata() in sklearn?

任何想法?

回答

3

为了您/他人的参考,我得到了几乎相同的错误(Ubuntu),包括“IOError:无法读取字节”错误。

我刚刚发布了一个解决方案,在

How to use datasets.fetch_mldata() in sklearn?

简短的回答 - 使用下面的:

from sklearn.datasets.mldata import fetch_mldata 
    data = fetch_mldata('mnist-original') 

dataset = fetch_mldata('mnist-original', data_home='***') 

更换***(保留引号)与您的首选位置(数据目录) 。

-1

在我的情况下,根本原因是损坏的mnist-original.mat文件。该文件已损坏,因为我在文件完全下载之前终止了Python。这留下了部分下载mnist-original.matC:\user\Taimi\scikit_learn_data\mldata

上面的解决方案适用于我,因为它只是在新位置提取新副本。更直接的解决方案是找到损坏的mnist-original.mat文件,将其删除并再次运行代码。正在运行的代码将再次下载mnist-original.mat。完整的mnist-original.mat大小为54,142 KB,因此如果连接速度较慢,则需要几分钟才能完成fetch_mldata()