python sklearn cross_validation /标签数量不匹配样本数量

做机器学习课程，我想将数据拆分为火车和测试集。我想分解它，使用Decisiontree进行培训，然后打印出测试集的分数。我的代码中的交叉验证参数已给出。有没有人看到我做错了什么？python sklearn cross_validation /标签数量不匹配样本数量

我得到的错误是：

Traceback (most recent call last): 
    File "/home/stephan/ud120-projects/validation/validate_poi.py", line 36, in <module> 
    clf = clf.fit(features_train, labels_train) 
    File "/home/stephan/.local/lib/python2.7/site-packages/sklearn/tree/tree.py", line 221, in fit 
    "number of samples=%d" % (len(y), n_samples)) 
ValueError: Number of labels=29 does not match number of samples=66

这里是我的代码：

import pickle 
import sys 
sys.path.append("../tools/") 
from feature_format import featureFormat, targetFeatureSplit 

data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r")) 

features_list = ["poi", "salary"] 

data = featureFormat(data_dict, features_list) 
labels, features = targetFeatureSplit(data) 

from sklearn import tree 
from sklearn import cross_validation 

features_train, labels_train, features_test, labels_test = \ 
    cross_validation.train_test_split(features, labels, random_state=42, test_size=0.3) 



clf = tree.DecisionTreeClassifier() 
clf = clf.fit(features_train, labels_train) 
print clf.score(features_test, labels_test)

来源

2015-06-20 hmmmbob

好吧，我认为，因为所有这些都在赋值中给出，它不太可能在它们中存在错误:( – hmmmbob

您的变量似乎并没有复原模式匹配train_test_split

尝试：

features_train, features_test, labels_train, labels_test = ...

来源

2015-06-20 20:39:53 Alexander

python sklearn cross_validation /标签数量不匹配样本数量

回答

相关问题