2016-12-18 40 views
0

我尝试使用下面的代码来执行蟒蛇回归 -Logistic回归使用Python中的Logit()和飞度()

from patsy import dmatrices 
import numpy as np 
import pandas as pd 
import statsmodels.api as sm 

df=pd.read_csv('C:/Users/Documents/titanic.csv') 
df=df.drop(['ticket','cabin','name','parch','sibsp','fare'],axis=1) #remove columns from table 
df=df.dropna() #dropping null values 

formula = 'survival ~ C(pclass) + C(sex) + age' 
df_train = df.iloc[ 0: 6, : ] 
df_test = df.iloc[ 6: , : ] 

#spliting data into dependent and independent variables 
y_train,x_train = dmatrices(formula, data=df_train,return_type='dataframe') 
y_test,x_test = dmatrices(formula, data=df_test,return_type='dataframe') 

#instantiate the model 
model = sm.Logit(y_train,x_train) 
res=model.fit() 
res.summary() 

我得到的错误,在此LINE-

--->res=model.fit() 

我在数据集中没有缺失值。但是,我的数据集非常小,只有10个条目。我不确定这里有什么问题,我该如何解决?我正在Jupyter笔记本上运行程序。整个错误信息如下所示 -

--------------------------------------------------------------------------- 
PerfectSeparationError     Traceback (most recent call last) 
<ipython-input-37-c6a47ec170d5> in <module>() 
    19 y_test,x_test = dmatrices(formula, data=df_test,return_type='dataframe') 
    20 model = sm.Logit(y_train,x_train) 
---> 21 res=model.fit() 
    22 res.summary() 

C:\Program Files\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs) 
    1374   bnryfit = super(Logit, self).fit(start_params=start_params, 
    1375     method=method, maxiter=maxiter, full_output=full_output, 
-> 1376     disp=disp, callback=callback, **kwargs) 
    1377 
    1378   discretefit = LogitResults(self, bnryfit) 

C:\Program Files\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs) 
    201   mlefit = super(DiscreteModel, self).fit(start_params=start_params, 
    202     method=method, maxiter=maxiter, full_output=full_output, 
--> 203     disp=disp, callback=callback, **kwargs) 
    204 
    205   return mlefit # up to subclasses to wrap results 

C:\Program Files\Anaconda3\lib\site-packages\statsmodels\base\model.py in fit(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs) 
    423              callback=callback, 
    424              retall=retall, 
--> 425              full_output=full_output) 
    426 
    427   #NOTE: this is for fit_regularized and should be generalized 

C:\Program Files\Anaconda3\lib\site-packages\statsmodels\base\optimizer.py in _fit(self, objective, gradient, start_params, fargs, kwargs, hessian, method, maxiter, full_output, disp, callback, retall) 
    182        disp=disp, maxiter=maxiter, callback=callback, 
    183        retall=retall, full_output=full_output, 
--> 184        hess=hessian) 
    185 
    186   # this is stupid TODO: just change this to something sane 

C:\Program Files\Anaconda3\lib\site-packages\statsmodels\base\optimizer.py in _fit_newton(f, score, start_params, fargs, kwargs, disp, maxiter, callback, retall, full_output, hess, ridge_factor) 
    246    history.append(newparams) 
    247   if callback is not None: 
--> 248    callback(newparams) 
    249   iterations += 1 
    250  fval = f(newparams, *fargs) # this is the negative likelihood 

C:\Program Files\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in _check_perfect_pred(self, params, *args) 
    184     np.allclose(fittedvalues - endog, 0)): 
    185    msg = "Perfect separation detected, results not available" 
--> 186    raise PerfectSeparationError(msg) 
    187 
    188  def fit(self, start_params=None, method='newton', maxiter=35, 

PerfectSeparationError: Perfect separation detected, results not available 

回答

2

您有完美的分离,这意味着您的数据可以通过超平面完全分离。发生这种情况时,参数的最大似然估计值是无限的,因此是您的错误。完美分离

例子:

Gender Outcome 
male  1 
male  1 
male  0 
female 0 
female 0 

在这种情况下,如果我得到一个女性的观察,我知道100%的把握认为结果会是0。也就是说,我的数据完全分离的结果。没有不确定性,找到我的系数的数值计算不会收敛。

根据你的错误,类似的事正发生在你身上。只有10个参赛作品,你可以想象这会发生什么,比如有1000个参赛作品或类似的东西。所以得到更多的数据:)

+0

其他:AFAICS,'model.raise_on_perfect_prediction = False'在调用model.fit之前会关闭完美的分离异常。但是,正如所解释的那样,参数未被识别,或者理论上将是无限的,但是结果中估计的参数将取决于优化停止标准。 – user333700