2014-10-06 35 views
3

我试图使用GridSearch为LinearSVC()的参数估计如下 -Scikit学习GridSearch给 “ValueError异常:多类格式不支持” 的错误

clf_SVM = LinearSVC() 
params = { 
      'C': [0.5, 1.0, 1.5], 
      'tol': [1e-3, 1e-4, 1e-5], 
      'multi_class': ['ovr', 'crammer_singer'], 
      } 
gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc') 
gs.fit(corpus1, y) 

corpus1具有形状(1726,7001)并且y已经形成(1726)

这是一个多类分类,并且y的值从0到3,包括两个端点,即有四个类。

但是,这是给我下面的错误 -

--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-220-0c627bda0543> in <module>() 
     5   } 
     6 gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc') 
----> 7 gs.fit(corpus1, y) 

/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.pyc in fit(self, X, y) 
    594 
    595   """ 
--> 596   return self._fit(X, y, ParameterGrid(self.param_grid)) 
    597 
    598 

/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.pyc in _fit(self, X, y, parameter_iterable) 
    376          train, test, self.verbose, parameters, 
    377          self.fit_params, return_parameters=True) 
--> 378    for parameters in parameter_iterable 
    379    for train, test in cv) 
    380 

/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable) 
    651    self._iterating = True 
    652    for function, args, kwargs in iterable: 
--> 653     self.dispatch(function, args, kwargs) 
    654 
    655    if pre_dispatch == "all" or n_jobs == 1: 

/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in dispatch(self, func, args, kwargs) 
    398   """ 
    399   if self._pool is None: 
--> 400    job = ImmediateApply(func, args, kwargs) 
    401    index = len(self._jobs) 
    402    if not _verbosity_filter(index, self.verbose): 

/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, func, args, kwargs) 
    136   # Don't delay the application, to avoid keeping the input 
    137   # arguments in memory 
--> 138   self.results = func(*args, **kwargs) 
    139 
    140  def get(self): 

/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters) 
    1238  else: 
    1239   estimator.fit(X_train, y_train, **fit_params) 
-> 1240  test_score = _score(estimator, X_test, y_test, scorer) 
    1241  if return_train_score: 
    1242   train_score = _score(estimator, X_train, y_train, scorer) 

/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.pyc in _score(estimator, X_test, y_test, scorer) 
    1294   score = scorer(estimator, X_test) 
    1295  else: 
-> 1296   score = scorer(estimator, X_test, y_test) 
    1297  if not isinstance(score, numbers.Number): 
    1298   raise ValueError("scoring must return a number, got %s (%s) instead." 

/usr/local/lib/python2.7/dist-packages/sklearn/metrics/scorer.pyc in __call__(self, clf, X, y) 
    136   y_type = type_of_target(y) 
    137   if y_type not in ("binary", "multilabel-indicator"): 
--> 138    raise ValueError("{0} format is not supported".format(y_type)) 
    139 
    140   try: 

ValueError: multiclass format is not supported 
+0

就可以打印变量的.fit – user1269942 2014-10-06 05:42:53

+0

corpus1使用的形状已形状(1726,7001)和y具有形状(1726) – theharshest 2014-10-06 05:44:57

回答

6

来自:

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score

“注:此实现仅限于在标签指示的二元分类任务或者多标签分类任务格式。”

尝试:

y = label_binarize(y, classes=[0, 1, 2, 3]) 

你训练之前。这将执行你的y的“单热”编码。

+0

谢谢,但现在我得到的https:/ /gist.github.com/anonymous/fd27da8cb43945de5e45 我检查了y和corpus1的形状,它们是(1726,4)和(1726,7001) – theharshest 2014-10-06 06:19:25

+0

你的形状现在是(1380,4)?变换的y应该是(1726,4) – user1269942 2014-10-06 06:22:12

+0

你的y变量中是否存在所有4个类? – user1269942 2014-10-06 06:23:53

2

因为已经指出的那样,你必须先二值化y

y = label_binarize(y, classes=[0, 1, 2, 3]) 

,然后使用一个多类学习算法像OneVsRestClassifierOneVsOneClassifier。例如:

clf_SVM = OneVsRestClassifier(LinearSVC()) 
params = { 
     'estimator__C': [0.5, 1.0, 1.5], 
     'estimator__tol': [1e-3, 1e-4, 1e-5], 
     } 
gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc') 
gs.fit(corpus1, y)