2014-01-23 112 views
3

使用熊猫OLS我能够适应和使用模型如下:Pandas统计模型ols回归预测使用DF预测?

ols_test = pd.ols(y=merged2[:-1].Units, x=merged2[:-1].lastqu) #to exclude current year, then do forecast method 
yrahead=(ols_test.beta['x'] * merged2.lastqu[-1:]) + ols_test.beta['intercept'] 

我需要切换到statsmodels所以现在得到一些额外的功能(主要是残差图见(question here

我有:

def fit_line2(x, y): 
    X = sm.add_constant(x, prepend=True) #Add a column of ones to allow the calculation of the intercept 
    model = sm.OLS(y, X,missing='drop').fit() 
    """Return slope, intercept of best fit line.""" 
    X = sm.add_constant(x) 
    return model 

和:

model=fit_line2(merged2[:-1].lastqu,merged2[:-1].Units) 
print fit.summary() 

但我不能让

yrahead2=model.predict(merged2.lastqu[-1:]) 

或任何变体给我的预测?请注意,pd.ols使用相同的merged2.lastqu [-1:]来获取我想要“预测”的数据,无论我为了预测而放入()中,我都没有任何快乐。 statsmodels想比大熊猫DF细胞我甚至想只是把一些如2696存在,但具体在()以外的东西仍然没有... 我现在的错误是

----> 3 yrahead2=model.predict(merged2.lastqu[-1:]) 

/usr/lib/pymodules/python2.7/statsmodels/base/model.pyc in predict(self, exog, transform, *args, **kwargs) 
    1004    exog = np.atleast_2d(exog) # needed in count model shape[1] 
    1005 
-> 1006   return self.model.predict(self.params, exog, *args, **kwargs) 
    1007 
    1008 

/usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.pyc in predict(self, params, exog) 
    253   if exog is None: 
    254    exog = self.exog 
--> 255   return np.dot(exog, params) 
    256 
    257 class GLS(RegressionModel): 

ValueError: objects are not aligned 

> /usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.py(255)predict() 
    254    exog = self.exog 
--> 255   return np.dot(exog, params) 
    256 

回答

2

merged2.lastqu[-1:]不包含常数

yrahead2=model.predict(sm.add_constant(merged2.lastqu[-1:], prepend=True))

应该工作。

另一种方法是不断以同样的方式添加到数据帧作为对模型中的X,并使用适当的列数据框df[['const', my_other_X]]

3

我更喜欢statsmodels式API 。至少对于,model.fit().predict想要一个数据帧,其中列具有相同的名称预测下面是一个例子:

In [2]: df = pd.DataFrame({'X': np.arange(10), 'Y': np.arange(10) + np.random.randn(10)}) 

In [3]: mod = sm.OLS.from_formula("Y ~ X", df) 

In [4]: res = mod.fit() 

In [5]: exog = pd.DataFrame({"X": np.linspace(0, 10, 100)}) 

In [6]: res.predict(exog) 
Out[6]: 
array([ 0.99817045, 1.07854804, 1.15892563, 1.23930322, 1.31968081, 
     1.40005839, 1.48043598, 1.56081357, 1.64119116, 1.72156875, 
     1.80194634, 1.88232393, 1.96270152, 2.04307911, 2.1234567 , 
     2.20383429, 2.28421188, 2.36458947, 2.44496706, 2.52534465, 
     2.60572224, 2.68609983, 2.76647742, 2.84685501, 2.92723259, 
     3.00761018, 3.08798777, 3.16836536, 3.24874295, 3.32912054, 
     3.40949813, 3.48987572, 3.57025331, 3.6506309 , 3.73100849, 
     3.81138608, 3.89176367, 3.97214126, 4.05251885, 4.13289644, 
     4.21327403, 4.29365162, 4.3740292 , 4.45440679, 4.53478438, 
     4.61516197, 4.69553956, 4.77591715, 4.85629474, 4.93667233, 
     5.01704992, 5.09742751, 5.1778051 , 5.25818269, 5.33856028, 
     5.41893787, 5.49931546, 5.57969305, 5.66007064, 5.74044823, 
     5.82082582, 5.9012034 , 5.98158099, 6.06195858, 6.14233617, 
     6.22271376, 6.30309135, 6.38346894, 6.46384653, 6.54422412, 
     6.62460171, 6.7049793 , 6.78535689, 6.86573448, 6.94611207, 
     7.02648966, 7.10686725, 7.18724484, 7.26762243, 7.34800002, 
     7.4283776 , 7.50875519, 7.58913278, 7.66951037, 7.74988796, 
     7.83026555, 7.91064314, 7.99102073, 8.07139832, 8.15177591, 
     8.2321535 , 8.31253109, 8.39290868, 8.47328627, 8.55366386, 
     8.63404145, 8.71441904, 8.79479663, 8.87517421, 8.9555518 ]) 
+0

谢谢,但..我用SM API的...我遇到的问题是model.predict(merged2.lastqu [-1:])这是一个看起来像日期的DF 2014-12-31 2651 名称:lastqu,dtype:float64 <<<我想使用2651作为“exog” – dartdog

+0

没有看到如何使用公式有助于直sm? ,并且对于我已经构建的函数中的DF的用例,不知道如何设置..当然有一种方法可以预测接受DF单元吗?我只是想预测一个时期。 – dartdog

+0

mmm有没有办法得到模型名称并重命名预测结果,如果这是问题?抓住这里 – dartdog