我试图运行中的链接如何在MATLAB中加载MNIST数字和标签数据?
https://github.com/bd622/DiscretHashing
离散哈希给出的代码是降维的方法是在近似最近邻搜索中使用。我想加载在http://yann.lecun.com/exdb/mnist/中可用的MNIST数据库上的实现。我已经从压缩的gz格式中提取文件。
问题1:
利用该解决方案来读取MNIST数据库Reading MNIST Image Database binary file in MATLAB
提供我收到以下错误:
Error using fread
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in Reading (line 7)
A = fread(fid, 1, 'uint32');
下面是代码:
clear all;
close all;
%//Open file
fid = fopen('t10k-images-idx3-ubyte', 'r');
A = fread(fid, 1, 'uint32');
magicNumber = swapbytes(uint32(A));
%//For each image, store into an individual cell
imageCellArray = cell(1, totalImages);
for k = 1 : totalImages
%//Read in numRows*numCols pixels at a time
A = fread(fid, numRows*numCols, 'uint8');
%//Reshape so that it becomes a matrix
%//We are actually reading this in column major format
%//so we need to transpose this at the end
imageCellArray{k} = reshape(uint8(A), numCols, numRows)';
end
%//Close the file
fclose(fid);
UPDATE:问题1解决,并且修改后的代码是
clear all;
close all;
%//Open file
fid = fopen('t10k-images.idx3-ubyte', 'r');
A = fread(fid, 1, 'uint32');
magicNumber = swapbytes(uint32(A));
%//Read in total number of images
%//A = fread(fid, 4, 'uint8');
%//totalImages = sum(bitshift(A', [24 16 8 0]));
%//OR
A = fread(fid, 1, 'uint32');
totalImages = swapbytes(uint32(A));
%//Read in number of rows
%//A = fread(fid, 4, 'uint8');
%//numRows = sum(bitshift(A', [24 16 8 0]));
%//OR
A = fread(fid, 1, 'uint32');
numRows = swapbytes(uint32(A));
%//Read in number of columns
%//A = fread(fid, 4, 'uint8');
%//numCols = sum(bitshift(A', [24 16 8 0]));
%// OR
A = fread(fid, 1, 'uint32');
numCols = swapbytes(uint32(A));
for k = 1 : totalImages
%//Read in numRows*numCols pixels at a time
A = fread(fid, numRows*numCols, 'uint8');
%//Reshape so that it becomes a matrix
%//We are actually reading this in column major format
%//so we need to transpose this at the end
imageCellArray{k} = reshape(uint8(A), numCols, numRows)';
end
%//Close the file
fclose(fid);
问题2:
我无法理解如何将4个文件的MNIST应用中的代码。代码包含变量
traindata = double(traindata);
testdata = double(testdata);
如何准备MNIST数据库以便我可以应用于实施?
UPDATE:我实现了解决方案,但我不断收到此错误
Error using fread
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in mnist_parse (line 11)
A = fread(fid1, 1, 'uint32');
这些文件
demo.m
%,这是调用该函数在MNIST数据读取主文件
clear all
clc
[Trainimages, Trainlabels] = mnist_parse('C:\Users\Desktop\MNIST\train-images-idx3-ubyte', 'C:\Users\Desktop\MNIST\train-labels-idx1-ubyte');
[Testimages, Testlabels] = mnist_parse('t10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte');
k=5;
digit = images(:,:,k);
lbl = label(k);
function [images, labels] = mnist_parse(path_to_digits, path_to_labels)
% Open files
fid1 = fopen(path_to_digits, 'r');
% The labels file
fid2 = fopen(path_to_labels, 'r');
% Read in magic numbers for both files
A = fread(fid1, 1, 'uint32');
magicNumber1 = swapbytes(uint32(A)); % Should be 2051
fprintf('Magic Number - Images: %d\n', magicNumber1);
A = fread(fid2, 1, 'uint32');
magicNumber2 = swapbytes(uint32(A)); % Should be 2049
fprintf('Magic Number - Labels: %d\n', magicNumber2);
% Read in total number of images
% Ensure that this number matches with the labels file
A = fread(fid1, 1, 'uint32');
totalImages = swapbytes(uint32(A));
A = fread(fid2, 1, 'uint32');
if totalImages ~= swapbytes(uint32(A))
error('Total number of images read from images and labels files are not the same');
end
fprintf('Total number of images: %d\n', totalImages);
% Read in number of rows
A = fread(fid1, 1, 'uint32');
numRows = swapbytes(uint32(A));
% Read in number of columns
A = fread(fid1, 1, 'uint32');
numCols = swapbytes(uint32(A));
fprintf('Dimensions of each digit: %d x %d\n', numRows, numCols);
% For each image, store into an individual slice
images = zeros(numRows, numCols, totalImages, 'uint8');
for k = 1 : totalImages
% Read in numRows*numCols pixels at a time
A = fread(fid1, numRows*numCols, 'uint8');
% Reshape so that it becomes a matrix
% We are actually reading this in column major format
% so we need to transpose this at the end
images(:,:,k) = reshape(uint8(A), numCols, numRows).';
end
% Read in the labels
labels = fread(fid2, totalImages, 'uint8');
% Close the files
fclose(fid1);
fclose(fid2);
end
错误很明显,您对无效的文件名使用了'fopen'。确保't10k-images-idx3-ubyte'是文件的* full *名称,它位于你当前的MATLAB路径中。否则,请确保它是要打开的文件的* full *绝对路径。 – excaza
@excaza:解决了第一个问题和文件读取操作引起的错误。文件名确实存在问题。但现在我不知道如何使用数据库,我无法理解如何使用这4个文件。我相信traindata变量将包含文件train-images.idx3-ubyte。然后,哪一个是testdata,然后我应该如何使用2个标签数据库文件?请帮忙 – SKM
@rayryeng:你能告诉我为什么当我实现你的答案时,由于文件读取操作而出现错误吗?我已经在问题中提出了新的更新。感谢您的时间和精力。 – SKM