使用熊猫和Sklearn.Neighbors

我想在数据框上使用Python 3.5/Pandas/Sklearn.neighbors来适应KNN模型。我导入了数据，将其分解为训练和测试数据和标签，但是当我尝试预测使用它时，出现以下错误。我对熊猫很新，所以任何帮助将不胜感激，谢谢！使用熊猫和Sklearn.Neighbors

import pandas as pd 
from sklearn import cross_validation 
import numpy as np 
from sklearn.neighbors import KNeighborsRegressor 
seeds = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','Species']) 
data = seeds.iloc[:,[0,1,2,3,4,5,6]] 
labels = seeds.iloc[:,[7]] 
x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1) 
knn = KNeighborsRegressor(n_neighbors=30) 
knn.fit(x_train,y_train) 
knn.predict(x_test) 

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-121-2292e64e5ab8> in <module>() 
----> 1 knn.predict(x_test) 

C:\Anaconda3\lib\site-packages\sklearn\neighbors\regression.py in predict(self, X) 
    151 
    152   if weights is None: 
--> 153    y_pred = np.mean(_y[neigh_ind], axis=1) 
    154   else: 
    155    y_pred = np.empty((X.shape[0], _y.shape[1]), dtype=np.float) 

C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in mean(a, axis, dtype, out, keepdims) 
    2876 
    2877  return _methods._mean(a, axis=axis, dtype=dtype, 
-> 2878       out=out, keepdims=keepdims) 
    2879 
    2880 

C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims) 
    66  if isinstance(ret, mu.ndarray): 
    67   ret = um.true_divide(
---> 68     ret, rcount, out=ret, casting='unsafe', subok=False) 
    69  elif hasattr(ret, 'dtype'): 
    70   ret = ret.dtype.type(ret/rcount) 

TypeError: unsupported operand type(s) for /: 'str' and 'int'

来源

2016-09-25 ConstantinL

你使用的是回归的，所以标签必须是数字，而不是字符串---所以你必须进行编码后，如果即使是有道理的......难道你确定你想要一个kNN回归器而不是一个kNN分类器？ –

你应该使用KNeighborsClassifier此KNN。您正试图预测标签Species的分类。上面代码中的回归器试图训练和预测连续有价值的数值变量，这是您的问题引入的地方。

from sklearn.neighbors import KNeighborsClassifier 
seeds = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','Species']) 
data = seeds.iloc[:,[0,1,2,3,4,5,6]] 
labels = seeds.iloc[:,[7]] 
x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1) 
knn = KNeighborsClassifier(n_neighbors=30)

http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html

这里是回归量将相比于分类（你要使用）绘制。