2

我留下这个例子创建一个分类器图像scikit学习:http://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html 尽管每个图像属于一个类别的一切工作,但每个图像可能属于几个类别,如:照片与白天狗,晚上猫的图片,猫的照片和狗在夜间等... 我写道:Scikit学习多个目标

target=[[0,1],[0,2],[1,2],[0,2,3]] 
target = MultiLabelBinarizer().fit_transform(target) 

classifier = svm.SVC(gamma=0.001) 
classifier.fit(data, target) 

,但我得到这个错误:

Traceback (most recent call last): 
    File "test.py", line 49, in <module> 
    classifier.fit(data, target) 
    File "/home/mezzo/.local/lib/python2.7/site-packages/sklearn/svm/base.py", line 151, in fit 
    y = self._validate_targets(y) 
    File "/home/mezzo/.local/lib/python2.7/site-packages/sklearn/svm/base.py", line 514, in _validate_targets 
    y_ = column_or_1d(y, warn=True) 
    File "/home/mezzo/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 551, in column_or_1d 
    raise ValueError("bad input shape {0}".format(shape)) 
ValueError: bad input shape (4, 4) 

完整代码

import numpy as np 
import PIL 
from PIL import Image 
import matplotlib.image as mpimg 

# The digits dataset 
digits = datasets.load_digits() 

def normalize(old_im): 
    base = 400 

    if (old_im.size[0] > old_im.size[1]): 
     wpercent = (base/float(old_im.size[0])) 
     hsize = int((float(old_im.size[1])*float(wpercent))) 
     old_im = old_im.resize((base,hsize), PIL.Image.ANTIALIAS) 
    else: 
     wpercent = (base/float(old_im.size[1])) 
     wsize = int((float(old_im.size[0])*float(wpercent))) 
     old_im = old_im.resize((wsize, base), PIL.Image.ANTIALIAS) 

    old_size = old_im.size 

    new_size = (base, base) 
    new_im = Image.new("RGB", new_size) 
    new_im.paste(old_im, ((new_size[0]-old_size[0])/2, 
          (new_size[1]-old_size[1])/2)) 

    #new_im.show() 
    new_im.save('prov.jpg') 
    return mpimg.imread('prov.jpg') 

# To apply a classifier on this data, we need to flatten the image, to 
# turn the data in a (samples, feature) matrix: 
imgs = np.array([normalize(Image.open('/home/mezzo/Immagini/1.jpg')),normalize(Image.open('/home/mezzo/Immagini/2.jpg')),normalize(Image.open('/home/mezzo/Immagini/3.jpg')),normalize(Image.open('/home/mezzo/Immagini/4.jpg'))]) 
n_samples = len(imgs) 
data = imgs.reshape((n_samples, -1)) 

target=[[0,1],[0,2],[1,2],[0,2,3]] 
target = MultiLabelBinarizer().fit_transform(target) 

# Create a classifier: a support vector classifier 
classifier = svm.SVC(gamma=0.001) 

# We learn the digits on the first half of the digits 
classifier.fit(data, target) 

# Now predict the value of the digit on the second half: 
predicted = classifier.predict(data) 

print("Classification report for classifier %s:\n%s\n" 
     % (classifier, metrics.classification_report(target, predicted))) 
print("Confusion matrix:\n%s" % metrics.confusion_matrix(target, predicted)) 

回答

0

Scikit学习的SVM实现本身并不支持多标签分类,although it has various other classifiers that do


它也可以做多标记分类与SVM通过处理标签的每个唯一组合为一个单独的类。你可以简单地用一个整数标签替换每个独特的排在target矩阵,which can be done efficiently using np.unique

d = np.dtype((np.void, target.dtype.itemsize * target.shape[1])) 
_, ulabels = np.unique(np.ascontiguousarray(target).view(d), return_inverse=True) 

然后,你可以训练SVM,你会为一个单标签分类问题:

clf = svm.SVC() 
clf.fit(data, ulabels) 

一潜在的警告是,如果您没有大量的训练实例,那么您的分类器的性能可能会很差,因为罕见的标签组合很差。

+0

对不起,对于最近的答复,但我得到这个错误,你通过我的代码:AttributeError:'list'对象没有属性'dtype' –

+0

谢谢你的答案,现在我看到:IndexError:tuple index out范围 –

+0

使用此行创建文件.py,您可以复制该错误: target = [[0,1],[0,2],[1,2],[0,2,3]] target = np.array(target) d = np.dtype((np.void,target.dtype.itemsize * target.shape [1])) _,ulabels = np.unique(np.ascontiguousarray(target).view( d),return_inverse = True) –

0

这是因为你的目标是:

array([[1, 1, 0, 0], 
     [1, 0, 1, 0], 
     [0, 1, 1, 0], 
     [1, 0, 1, 1]]) 

你的目标必须是形状(M),其中M是实例数。 一个对付这种方式是你的二进制字节数组转换为标签,这样的:

for item in target: 
    print(sum(1<<i for i, b in enumerate(item) if b)) 

这样做的结果应该是:

3 
5 
6 
13 

现在你可以使用[3,5,6,13]作为你的目标。

+0

您可以为每个可能类别的子集创建一个新标签,就像在示例中一样。 – Farseer