2012-10-26 60 views
0

我有一个df熊猫dataframe rolling_mean()如何做?

    sales net_pft 
STK_ID RPT_Date     
600141 201.780 1.833 
     20110331 13.725 0.384 
     20110630 32.733 1.132 
     20110930 50.386 1.923 
     20111231 65.685 2.325 
     20120331 21.088 0.656 
     20120630 46.952 1.591 
600809 201.166 4.945 
     20110331 18.724 5.061 
     20110630 28.948 6.586 
     20110930 35.637 7.075 
     20111231 44.882 7.805 
     20120331 22.140 4.925 
     20120630 38.157 7.868 

我想要做的所有列的滚动平均值,GROUPBY STK_ID后,该规则由像伪代码表示:

if RPT_Date[4:8] == '0331': 
    all_column = rolling_mean(all_column,2) 

if RPT_Date[4:8] == '0630': 
    all_column = rolling_mean(all_column,3) 

if RPT_Date[4:8] == '0930': 
    all_column = rolling_mean(all_column,4) 

if RPT_Date[4:8] == '1231': 
    all_column = rolling_mean(all_column,5) 

if is_the_first_row(): 
    keep_original_values() 

all_column这里代表'sales ','net_pft'。最终结果如下:

    sales net_pft 
STK_ID RPT_Date     
600141 201.780 1.833 # same as original value 
     20110331 30.253 1.109 # average of row1&row2 
     20110630 31.079 1.116 # average of row1&row2&row3 
...... 
600809 201.166 4.945 # same as original value 
     20110331 24.445 5.003 # average of row1&row2 
..... 

如何写在整洁的熊猫表达?

+0

这对我来说并不清楚你想要什么?你的意思是某种“累积平均值” – joris

回答

2

我想你想要这个?

In [29]: df.groupby(level='STK_ID').apply(lambda x: pd.expanding_mean(x)) 
Out[29]: 
        sales net_pft 
STK_ID RPT_Date      
600141 201.780000 1.833000 
     20110331 30.252500 1.108500 
     20110630 31.079333 1.116333 
     20110930 35.906000 1.318000 
     20111231 41.861800 1.519400 
     20120331 38.399500 1.375500 
     20120630 39.621286 1.406286 
600809 201.166000 4.945000 
     20110331 24.445000 5.003000 
     20110630 25.946000 5.530667 
     20110930 28.368750 5.916750 
     20111231 31.671400 6.294400 
     20120331 30.082833 6.066167 
     20120630 31.236286 6.323571 
+0

与expansion_mean()不完全相同,因为滚动窗口取决于RPT_Date,并且周期性地在2-5范围内。但是expanding_mean()是非常强大的函数。谢谢你的提示。 – bigbug