Sklearn - GridSearchCV与v_measure_score是不一样的

我想使用GridSearchCV与v_measure_score 和比较结果
用另一种方法无GridSearchCV。Sklearn - GridSearchCV与v_measure_score是不一样的

v_measure_score由for循环的最好成绩是0.69816019299与百分27;
最好成绩GridSearchCV是0.565562627046百分位数。

在我看来，结果应该是一样的。
我检查了我的代码几次，但仍然无法弄清楚原因。以下是我的代码：

GridSearchCV

estimators = [('tfIdf', TfidfTransformer()), ('sPT', SelectPercentile()), ('kmeans', cluster.KMeans())] 
pipe = Pipeline(estimators) 
params = dict(tfIdf__smooth_idf=[True], 
       sPT__score_func= [f_classif], sPT__percentile=range(100, 0, -1), 
       kmeans__n_clusters=[clusterNum], kmeans__random_state=[0], kmeans__precompute_distances=[True]) 
v_measure_scorer = make_scorer(v_measure_score) 
grid_search = GridSearchCV(pipe, param_grid=params, scoring=v_measure_scorer) 
grid_search_fit = grid_search.fit(apiVectorArray, yTarget)

v_measure_score由环

bestPercent = [-1, -1] 
for percent in xrange(100, 0, -1): 
    transformer = TfidfTransformer(smooth_idf=True) 
    apiVectorArrayTFIDF = transformer.fit_transform(apiVectorArray) 
    apiVectorFit = SelectPercentile(f_classif, percentile=percent).fit(apiVectorArrayTFIDF, yTarget) 
    k_means = cluster.KMeans(n_clusters=clusterNum, random_state=0, precompute_distances=True).fit(apiVectorFit.transform(apiVectorArrayTFIDF)) 

    if v_measure_score(yTarget, k_means.labels_) > bestPercent[1]: 
     bestPercent[0] = percent 
     bestPercent[1] = v_measure_score(yTarget, k_means.labels_)

我想在我的代码添加颜色，但失败了。
对不起，你的眼睛。

谢谢。

来源

2016-11-18 Che-Hao Kang

我认为答案是因为GridSearchCV使用交叉验证来拟合数据，得分不同于for-loop。

来源

2016-11-25 12:07:36

Sklearn - GridSearchCV与v_measure_score是不一样的

回答

相关问题