2017-05-24 37 views
0

我有一个图像数据集,我创建每个图像的直方图,然后我想将它们存储(写入)到一个文件中,这样对于每个用作输入的新图像,我比较这个图像的直方图和我已经在文件中的直方图,并找出它们是否相同。到目前为止的代码是这样的:比较文件中的多个直方图OpenCV

import numpy as np 
import cv2 
import os.path 
import glob 
import matplotlib.pyplot as plt 
import pickle 

index = {} 

#output dic 
out = { 
    1: {}, 
    2: {}, 
    3: {}, 
} 

for t in [1]: 

    #load_files 
    files = glob.glob(os.path.join("..", "data", "train", "Type_{}".format(t), "*.jpg")) 
    no_files = len(files) 

    #iterate and read 
    for n, file in enumerate(files): 
     try: 
      image = cv2.imread(file) 
      img = cv2.resize(image, None, fx=0.1, fy=0.1, interpolation=cv2.INTER_AREA) 

      # features : histograms 
      plt.hist(img.flatten(), 256, [0, 256], color='r') 
      plt.xlim([0,256]) 
      plt.legend('histogram', loc='upper left') 
      plt.show() 
      # index[file] = hist 

      # write histograms into file 
      #compare them and find similarity score 
      # result_dist = compareHist(index[0], index[1], cv2.cv.CV_COMP_CORREL) 

      print(file, t, "-files left", no_files - n) 

     except Exception as e: 
      print(e) 
      print(file) 

有人可以指导我通过这个吗?谢谢!

回答

1

你可以计算所有这样的图像的红色通道直方图:

import os 
import glob 
import numpy as np 
from skimage import io 

root = 'C:\Users\you\imgs' # Change this appropriately 
folders = ['Type_1', 'Type_2', 'Type_3'] 
extension = '*.bmp' # Change if necessary 

def compute_red_histograms(root, folders, extension): 
    X = [] 
    y = [] 
    for n, imtype in enumerate(folders): 
     filenames = glob.glob(os.path.join(root, imtype, extension))  
     for fn in filenames: 
      img = io.imread(fn) 
      red = img[:, :, 0] 
      h, _ = np.histogram(red, bins=np.arange(257), normed=True) 
      X.append(h) 
      y.append(n) 
    return np.vstack(X), np.array(y) 

X, y = compute_red_histograms(root, folders, extension) 

每个图像通过一个256维特征向量(红色通道直方图的部件)所示,因此X是一个2D NumPy数组的行数与数据集中的图像数量和256列数量一样多。 y是具有数字类标签的一维NumPy阵列,即0代表Type_1,1代表Type_22代表Type_3

接下来,您可以将您的数据集分成训练和测试,像这样:

from sklearn.model_selection import train_test_split 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0) 

最后,你可以训练一个SVM分类:

from sklearn.svm import SVC 

clf = SVC() 
clf.fit(X_train, y_train) 

通过这样做,你可以做出预测或评估分类准确度很容易:

In [197]: y_test 
Out[197]: array([0, 2, 0, ..., 0, 0, 1]) 

In [198]: clf.predict(X_test) 
Out[198]: array([2, 2, 2, ..., 2, 2, 2]) 

In [199]: y_test == clf.predict(X_test) 
Out[199]: array([False, True, False, ..., False, False, False], dtype=bool) 

In [200]: clf.score(X_test, y_test) 
Out[200]: 0.3125 
+0

感谢您的帮助。它在'filenames = glob.glob(os.path.join(root,obj.folder,extension))'时给了我一个错误',那个名字'obj'没有被定义 – joasa

+0

我编辑了我的答案来修复那个错误 – Tonechas

+0

现在我获取这些警告_UserWarning:可能损坏的EXIF数据。期待读取524288字节,但只有0跳过标记3 “跳过标记%s”%(大小,len(数据),标记))_和此错误消息:** OSError:图像文件被截断(54字节不是处理)**。你知道这是什么意思吗? – joasa