2017-02-03 320 views
1

我计算了我的多元线性回归方程,我想查看调整后的R平方。我知道分数函数允许我看到r-squared,但它没有被调整。python sklearn多元线性回归显示r-squared

import pandas as pd #import the pandas module 
import numpy as np 
df = pd.read_csv ('/Users/jeangelj/Documents/training/linexdata.csv', sep=',') 
df 
     AverageNumberofTickets NumberofEmployees ValueofContract Industry 
    0    1     51     25750 Retail 
    1    9     68     25000 Services 
    2    20     67     40000 Services 
    3    1     124     35000 Retail 
    4    8     124     25000 Manufacturing 
    5    30     134     50000 Services 
    6    20     157     48000 Retail 
    7    8     190     32000 Retail 
    8    20     205     70000 Retail 
    9    50     230     75000 Manufacturing 
    10    35     265     50000 Manufacturing 
    11    65     296     75000 Services 
    12    35     336     50000 Manufacturing 
    13    60     359     75000 Manufacturing 
    14    85     403     81000 Services 
    15    40     418     60000 Retail 
    16    75     437     53000 Services 
    17    85     451     90000 Services 
    18    65     465     70000 Retail 
    19    95     491     100000 Services 

from sklearn.linear_model import LinearRegression 
model = LinearRegression() 
X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets 
model.fit(X, y) 
model.score(X, y) 
>>0.87764337132340009 

我手动检查过,而0.87764是R平方;而0.863248是调整后的R平方。

回答

12

有许多不同的方法来计算R^2adjusted R^2,以下是他们几个(根据您所提供的数据计算):

from sklearn.linear_model import LinearRegression 
model = LinearRegression() 
X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets 
model.fit(X, y) 

# compute with formulas from the theory 
yhat = model.predict(X) 
SS_Residual = sum((y-yhat)**2) 
SS_Total = sum((y-np.mean(y))**2) 
r_squared = 1 - (float(SS_Residual))/SS_Total 
adjusted_r_squared = 1 - (1-r_squared)*(len(y)-1)/(len(y)-X.shape[1]-1) 
print r_squared, adjusted_r_squared 
# 0.877643371323 0.863248473832 

# compute with sklearn linear_model, although could not find any function to compute adjusted-r-square directly from documentation 
print model.score(X, y), 1 - (1-model.score(X, y))*(len(y)-1)/(len(y)-X.shape[1]-1) 
# 0.877643371323 0.863248473832 

# compute with statsmodels, by adding intercept manually 
import statsmodels.api as sm 
X1 = sm.add_constant(X) 
result = sm.OLS(y, X1).fit() 
#print dir(result) 
print result.rsquared, result.rsquared_adj 
# 0.877643371323 0.863248473832 

# compute with statsmodels, another way, using formula 
import statsmodels.formula.api as sm 
result = sm.ols(formula="AverageNumberofTickets ~ NumberofEmployees + ValueofContract", data=df).fit() 
#print result.summary() 
print result.rsquared, result.rsquared_adj 
# 0.877643371323 0.863248473832 
+1

令人印象深刻 - 非常感谢你 – jeangelj

+2

仅供参考,您可以使用模型.coef_而不是公式中的X.shape [1]。这种方式更具说明性 –