比较文件中的多个直方图OpenCV

我有一个图像数据集，我创建每个图像的直方图，然后我想将它们存储（写入）到一个文件中，这样对于每个用作输入的新图像，我比较这个图像的直方图和我已经在文件中的直方图，并找出它们是否相同。到目前为止的代码是这样的：比较文件中的多个直方图OpenCV

import numpy as np 
import cv2 
import os.path 
import glob 
import matplotlib.pyplot as plt 
import pickle 

index = {} 

#output dic 
out = { 
    1: {}, 
    2: {}, 
    3: {}, 
} 

for t in [1]: 

    #load_files 
    files = glob.glob(os.path.join("..", "data", "train", "Type_{}".format(t), "*.jpg")) 
    no_files = len(files) 

    #iterate and read 
    for n, file in enumerate(files): 
     try: 
      image = cv2.imread(file) 
      img = cv2.resize(image, None, fx=0.1, fy=0.1, interpolation=cv2.INTER_AREA) 

      # features : histograms 
      plt.hist(img.flatten(), 256, [0, 256], color='r') 
      plt.xlim([0,256]) 
      plt.legend('histogram', loc='upper left') 
      plt.show() 
      # index[file] = hist 

      # write histograms into file 
      #compare them and find similarity score 
      # result_dist = compareHist(index[0], index[1], cv2.cv.CV_COMP_CORREL) 

      print(file, t, "-files left", no_files - n) 

     except Exception as e: 
      print(e) 
      print(file)

有人可以指导我通过这个吗？谢谢！

来源

2017-05-24 joasa

你可以计算所有这样的图像的红色通道直方图：

import os 
import glob 
import numpy as np 
from skimage import io 

root = 'C:\Users\you\imgs' # Change this appropriately 
folders = ['Type_1', 'Type_2', 'Type_3'] 
extension = '*.bmp' # Change if necessary 

def compute_red_histograms(root, folders, extension): 
    X = [] 
    y = [] 
    for n, imtype in enumerate(folders): 
     filenames = glob.glob(os.path.join(root, imtype, extension))  
     for fn in filenames: 
      img = io.imread(fn) 
      red = img[:, :, 0] 
      h, _ = np.histogram(red, bins=np.arange(257), normed=True) 
      X.append(h) 
      y.append(n) 
    return np.vstack(X), np.array(y) 

X, y = compute_red_histograms(root, folders, extension)

每个图像通过一个256维特征向量（红色通道直方图的部件）所示，因此X是一个2D NumPy数组的行数与数据集中的图像数量和256列数量一样多。 y是具有数字类标签的一维NumPy阵列，即0代表Type_1,1代表Type_2和2代表Type_3。

接下来，您可以将您的数据集分成训练和测试，像这样：

from sklearn.model_selection import train_test_split 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

最后，你可以训练一个SVM分类：

from sklearn.svm import SVC 

clf = SVC() 
clf.fit(X_train, y_train)

通过这样做，你可以做出预测或评估分类准确度很容易：

In [197]: y_test 
Out[197]: array([0, 2, 0, ..., 0, 0, 1]) 

In [198]: clf.predict(X_test) 
Out[198]: array([2, 2, 2, ..., 2, 2, 2]) 

In [199]: y_test == clf.predict(X_test) 
Out[199]: array([False, True, False, ..., False, False, False], dtype=bool) 

In [200]: clf.score(X_test, y_test) 
Out[200]: 0.3125

来源

2017-05-24 15:39:39 Tonechas

感谢您的帮助。它在'filenames = glob.glob（os.path.join（root，obj.folder，extension））'时给了我一个错误'，那个名字'obj'没有被定义 – joasa

我编辑了我的答案来修复那个错误 – Tonechas

现在我获取这些警告_UserWarning：可能损坏的EXIF数据。期待读取524288字节，但只有0跳过标记3 “跳过标记％s”％（大小，len（数据），标记））_和此错误消息：** OSError：图像文件被截断（54字节不是处理）**。你知道这是什么意思吗？ – joasa

比较文件中的多个直方图OpenCV

回答

相关问题