我试图理解为什么我收到以下情况 - 我使用的虹膜数据,并做交叉验证与ķ -nearest邻分类选择最佳ķ。GridSearchCV意外的平均结果
from sklearn.neighbors import KNeighborsClassifier
from sklearn import grid_search
from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size=0.33, random_state=42)
parameters = {'n_neighbors': range(1,21)}
knn = sklearn.neighbors.KNeighborsClassifier()
clf = grid_search.GridSearchCV(knn, parameters,cv=10)
clf.fit(X_train, Y_train)
clf
对象有结果。
print clf.grid_scores_
[平均:0.94000,标准:0.08483,则params:{ 'N_NEIGHBORS':1},平均:0.93000,标准:0.08251,则params:{ 'N_NEIGHBORS':2},平均:0.94000, std:0.08456,params:{'n_neighbors':3},意思是:0.95000,std:0.08101,params:{'n_neighbors':4},意思是0.95000,std:0.08562,params:{'n_neighbors':5},平均值:0.93000,标准偏差:0.08284,参数:{'n_neighbors':6},平均值:0.95000,标准偏差:0.08512,参数:{'n_neighbors':7},平均值:0.94000,标准偏差:0.08414,params:{'n_neighbors' :8},平均值:0.94000,标准偏差:0.08414,参数:{'n_neighbors':9},平均值:0.94000,标准偏差:0.08414,参数:{'n_neighbors':10},平均值:0.94000,标准偏差:0.08483, {'n_neighbors':11},意思是:0.93000,std:0.08284,params:{'n_neighbors':12},意思是:0.93000,std:0.08284,params:{'n_n参数:{'n_neighbors':15},平均值:0.93000,标准偏差:0.08284,参数:{'n_neighbors':14} params:{'n_neighbors':16},意思是:0.94000,std:0.08483,params:{'n_neighbors':17},意思是:0.93000,std:0.09458,params:{'n_neighbors':18},意思是0.94000, STD:0.08483,则params:{ 'N_NEIGHBORS':19},平均:0.93000,标准:0.10887,则params:{ 'N_NEIGHBORS':20}]
然而,当我得到用于第一壳体10个CV结果k=1
print clf.grid_scores_[0].cv_validation_scores
我们得到
array([ 1. , 0.90909091, 1. , 0.72727273, 0.9 ,
1. , 1. , 1. , 1. , 0.88888889])
然而,这些10个观察
print clf.grid_scores_[0].cv_validation_scores.mean()
的平均值为0.942525252525,而不是呈现0.940000物体上。
所以,我很困惑,什么意思是在做什么,为什么它不一样。我阅读了文档,但没有发现任何可以帮助我的文档。我错过了什么?