将模型拟合到Python中的所有变量（Scikit学习）

这是被问及其他地方的一个不同的软件包，但是Scikit有没有一种方法学习包括所有变量或所有变量减去R中的某个指定数？将模型拟合到Python中的所有变量（Scikit学习）

举一个我的意思的例子，说我有一个回归y = x1 + x2 + x3 + x4。在R I可以通过运行评估这个回归：

result = lm(y ~ ., data=DF) 
summary(result)

我会想象有一个类似的方式凝结在Python中的公式，因为写出所有的变量更大的数据集将是一种愚蠢的。

来源

2017-02-22 114

我不这么认为，这里是sklearn [这里]（http://scikit-learn.org/stable/auto_examples/linear_model为例/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py） – cdeterman

@lmo我用两个标记了它，因为我认为R用户和Scikit用户之间可能会有重叠。 – 114

@ 114你究竟在做什么？你能举一个玩具的例子吗？ –

我们可以尝试以下解决方法（我们使用iris数据集和数字标签species，适合线性回归模型，看看如何使用这两种在R和python sklearn的独立预测指标）：

就R

summary(lm(as.numeric(Species)~., iris))[c('coefficients', 'r.squared')] 

$coefficients 
       Estimate Std. Error t value  Pr(>|t|) 
(Intercept) 1.18649525 0.20484104 5.792273 4.150495e-08 
Sepal.Length -0.11190585 0.05764674 -1.941235 5.416918e-02 
Sepal.Width -0.04007949 0.05968881 -0.671474 5.029869e-01 
Petal.Length 0.22864503 0.05685036 4.021874 9.255215e-05 
Petal.Width 0.60925205 0.09445750 6.450013 1.564180e-09 

$r.squared 
[1] 0.9303939

在Python（sklearn与糊状）

from sklearn.datasets import load_iris 
import pandas as pd 
from patsy import dmatrices 

iris = load_iris() 
names = [f_name.replace(" ", "_").strip("_(cm)") for f_name in iris.feature_names] 
iris_df = pd.DataFrame(iris.data, columns=names) 
iris_df['species'] = iris.target 

# pasty does not support '.' at least in windows python 2.7, so here is the workaround 
y, X = dmatrices('species ~ ' + '+'.join(iris_df.columns - ['species']), 
        iris_df, return_type="dataframe") 

from sklearn.linear_model import LinearRegression 
model = LinearRegression() 
model.fit(X, y) 

print model.score(X,y) 
# 0.930422367533 

print model.intercept_, model.coef_ 
# [ 0.19208399] [[0.22700138 0.60989412 -0.10974146 -0.04424045]]

正如我们所看到的，在R和Python中学习的模型与pasty是相似的（系数的顺序是不同的）。

来源

2017-02-22 21:58:13

'statsmodels'原生支持'patsy' forumals ...可能值得一提... http://statsmodels.sourceforge.net/0.6.0/examples/notebooks/generated/formulas.html –

Scikit有没有办法学会包括所有变量或所有变量减去一些指定的数字？

是的，sklearn +熊猫，以适应使用除此之外的所有变量，并用它作为一个标签，你可以做简单的

model.fit(df.drop('y', axis=1), df['y'])

，这将适用于大多数sklearn车型。

这将是pandas + sklearn等效的r ~和-符号的，如果不使用pasty。

要排除多个变量，你可以做

df.drop(['v1', 'v2'], axis=1)

来源

2017-02-23 12:37:00 JARS

将模型拟合到Python中的所有变量（Scikit学习）

回答

相关问题