1

我一直在使用sklearn尝试线性回归。有时我得到一个值错误,有时它工作正常。我不知道使用哪种方法。是 错误信息如下:Python Sklearn线性回归值错误

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 512, in fit 
    y_numeric=True, multi_output=True) 
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 531, in check_X_y 
    check_consistent_length(X, y) 
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 181, in check_consistent_length 
    " samples: %r" % [int(l) for l in lengths]) 
ValueError: Found input variables with inconsistent numbers of samples: [1, 200] 

的代码是这样的:

import pandas as pd 
from sklearn.linear_model import LinearRegression 
data = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv', index_col=0); 
x = data['TV'] 
y = data['Sales'] 
lm = LinearRegression() 
lm.fit(x,y) 

请帮助我。我是一名学生,想要学习机器学习的基础知识。

回答

1

lm.fit预计X是一个

numpy的阵列或形状的稀疏矩阵[N_SAMPLES次,n_features]

x具有形状:

In [6]: x.shape 
Out[6]: (200,) 

只需使用:

lm.fit(x.reshape(-1,1) ,y) 
+0

谢谢!工作得很好 –

1

您通过X作为一个数据帧,而不是一个系列,你可以使用[[]] “双括号” 或to_frame()单个功能:

import pandas as pd 
from sklearn.linear_model import LinearRegression 
data = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv', index_col=0); 
x = data[['TV']] 

或者

x = data['TV'].to_frame() 
y = data['Sales'] 
lm = LinearRegression() 
lm.fit(x,y) 

输出:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False) 
+0

谢谢!工作得很好 –