我想绘制删除样本(行)的效果。有人称之为“学习曲线”。如何发送数据帧到scikit进行交叉验证?
所以我想使用熊猫来删除一些行。 How to remove, randomly, rows from a dataframe but from each label?
但是,当我想要做的交叉验证,我得到以下错误(即使使用df.values
把数据框到一个数组后):
所以,我是什么做错了?
这里是我的代码:
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn import neighbors
from sklearn import cross_validation
df = pd.DataFrame(np.random.rand(12, 5))
label = np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
df['label'] = label
df1 = pd.concat(g.sample(2) for idx, g in df.groupby('label'))
X = df1[[0, 1, 2, 3, 4]].values
y = df1.label.values
print(X)
print(y)
clf = neighbors.KNeighborsClassifier()
sss = StratifiedShuffleSplit(1, test_size=0.1)
scoresSSS = cross_validation.cross_val_score(clf, X, y, cv=sss)
print(scoresSSS)