我有以下代码来测试一些sklearn Python库中最流行的ML算法:逻辑回归:未知的标签类型:“连续”使用sklearn在python
import numpy as np
from sklearn import metrics, svm
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
trainingData = np.array([ [2.3, 4.3, 2.5], [1.3, 5.2, 5.2], [3.3, 2.9, 0.8], [3.1, 4.3, 4.0] ])
trainingScores = np.array([3.4, 7.5, 4.5, 1.6])
predictionData = np.array([ [2.5, 2.4, 2.7], [2.7, 3.2, 1.2] ])
clf = LinearRegression()
clf.fit(trainingData, trainingScores)
print("LinearRegression")
print(clf.predict(predictionData))
clf = svm.SVR()
clf.fit(trainingData, trainingScores)
print("SVR")
print(clf.predict(predictionData))
clf = LogisticRegression()
clf.fit(trainingData, trainingScores)
print("LogisticRegression")
print(clf.predict(predictionData))
clf = DecisionTreeClassifier()
clf.fit(trainingData, trainingScores)
print("DecisionTreeClassifier")
print(clf.predict(predictionData))
clf = KNeighborsClassifier()
clf.fit(trainingData, trainingScores)
print("KNeighborsClassifier")
print(clf.predict(predictionData))
clf = LinearDiscriminantAnalysis()
clf.fit(trainingData, trainingScores)
print("LinearDiscriminantAnalysis")
print(clf.predict(predictionData))
clf = GaussianNB()
clf.fit(trainingData, trainingScores)
print("GaussianNB")
print(clf.predict(predictionData))
clf = SVC()
clf.fit(trainingData, trainingScores)
print("SVC")
print(clf.predict(predictionData))
的前两部作品不错,但我得到了在LogisticRegression
通话以下错误:
[email protected]:/home/ouhma# python stack.py
LinearRegression
[ 15.72023529 6.46666667]
SVR
[ 3.95570063 4.23426243]
Traceback (most recent call last):
File "stack.py", line 28, in <module>
clf.fit(trainingData, trainingScores)
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit
check_classification_targets(y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'
输入数据是一样的,在之前的电话,所以这到底是怎么回事呢?
顺便说一下,为什么在LinearRegression()
和SVR()
算法(15.72 vs 3.95)
的第一个预测中存在巨大差异?
谢谢!所以我必须将'2.3'转换为'23'等等,不是吗?有一种使用numpy或pandas进行转换的优雅方法? – harrison4
但是,在这个例子中,输入数据使用LogisticRegression函数具有浮点数:http://machinelearningmastery.com/compare-machine-learning-algorithms-python-scikit-learn/ ...并且它工作正常。为什么? – harrison4
输入可以是浮点数,但输出需要是分类的,即int。在这个例子中,第8列只有0或1。 通常情况下,您可以使用分类标签,例如['红','大','生病'],你需要将其转换为数值。请尝试http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features或http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html –