支持SVM的GridSearch生成IndexError

我正在使用SVM构建分类器，并希望执行网格搜索以帮助自动查找最佳模型。下面的代码：支持SVM的GridSearch生成IndexError

from sklearn.svm import SVC 
from sklearn.model_selection import train_test_split 
from sklearn.model_selection import GridSearchCV 
from sklearn.multiclass import OneVsRestClassifier 

X.shape  # (22343, 323) 
y.shape  # (22343, 1) 

X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=0.4, random_state=0 
) 

tuned_parameters = [ 
    { 
    'estimator__kernel': ['rbf'], 
    'estimator__gamma': [1e-3, 1e-4], 
    'estimator__C': [1, 10, 100, 1000] 
    }, 
    { 
    'estimator__kernel': ['linear'], 
    'estimator__C': [1, 10, 100, 1000] 
    } 
] 

model_to_set = OneVsRestClassifier(SVC(), n_jobs=-1) 
clf = GridSearchCV(model_to_set, tuned_parameters) 
clf.fit(X_train, y_train)

，我得到以下错误信息（这是不是整个堆栈跟踪刚刚过去的3个电话。）：

---------------------------------------------------- 
/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_split.py in split(self, X, y, groups) 
    88   X, y, groups = indexable(X, y, groups) 
    89   indices = np.arange(_num_samples(X)) 
---> 90   for test_index in self._iter_test_masks(X, y, groups): 
    91    train_index = indices[np.logical_not(test_index)] 
    92    test_index = indices[test_index] 

/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_split.py in _iter_test_masks(self, X, y, groups) 
    606 
    607  def _iter_test_masks(self, X, y=None, groups=None): 
--> 608   test_folds = self._make_test_folds(X, y) 
    609   for i in range(self.n_splits): 
    610    yield test_folds == i 

/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_split.py in _make_test_folds(self, X, y, groups) 
    593   for test_fold_indices, per_cls_splits in enumerate(zip(*per_cls_cvs)): 
    594    for cls, (_, test_split) in zip(unique_y, per_cls_splits): 
--> 595     cls_test_folds = test_folds[y == cls] 
    596     # the test split can be too big because we used 
    597     # KFold(...).split(X[:max(c, n_splits)]) when data is not 100% 

IndexError: too many indices for array

此外，当我试图重塑阵列所以y是（22343，）我发现即使将tuned_parameters设置为默认值，GridSearch也不会结束。

而且这里的版本所有的软件包是否有帮助：

的Python：3.5.2

scikit学习：0.18

大熊猫：0.19.0

来源

2016-10-06 William Gottschalk

您是否试图减少样本数量并运行它？ – MMF

它似乎你的实现没有错误。

但是，正如sklearn文档中提到的那样，“拟合时间复杂度超过二次样本数，因此样本数很难通过多个10000样本缩放到数据集”。 See documentation here

对于您的情况，您有22343样本，这可能会导致一些计算问题/内存问题。这就是为什么当你做你的默认CV时，需要很多时间。尝试减少您的火车设置使用10000样本或更少。

来源

2016-10-06 18:07:38 MMF

支持SVM的GridSearch生成IndexError

回答

相关问题