计算差异

我想比较多个模型的输出运行，计算这些值：计算差异

当期的收入和上期

实际当期的差异之间

差异收入和预测的当期收入

我尝试过使用多指标，并且怀疑答案在于某个创意转换（）的方向。但是，恐怕我已经通过偶然的应用各种支点/熔化/团体实验来改变问题。也许你可以帮我找出如何把这：

import pandas as pd 

ids = [1,2,3] * 5 
year = ['2013', '2013', '2013', '2014', '2014', '2014', '2014', '2014', '2014', '2015', '2015', '2015', '2015', '2015', '2015'] 
run = ['actual','actual','actual','forecast','forecast','forecast','actual','actual','actual','forecast','forecast','forecast','actual','actual','actual'] 

revenue = [10,20,20,30,50,90,10,40,50,120,210,150,130,100,190] 

change_from_previous_year = ['NA','NA','NA',20,30,70,0,20,30,90,160,60,120,60,140] 
change_from_forecast = ['NA','NA','NA','NA','NA','NA',-20,-10,-40,'NA','NA','NA',30,-110,40] 

d = {'ids':ids, 'year':year, 'run':run, 'revenue':revenue} 

df = pd.DataFrame(data=d, columns=['ids','year','run','revenue']) 
print df 

    ids year  run revenue 
0  1 2013 actual  10 
1  2 2013 actual  20 
2  3 2013 actual  20 
3  1 2014 forecast  30 
4  2 2014 forecast  50 
5  3 2014 forecast  90 
6  1 2014 actual  10 
7  2 2014 actual  40 
8  3 2014 actual  50 
9  1 2015 forecast  120 
10 2 2015 forecast  210 
11 3 2015 forecast  150 
12 1 2015 actual  130 
13 2 2015 actual  100 
14 3 2015 actual  190

....这个：

ids year  run revenue chg_from_prev_year chg_from_forecast 
0  1 2013 actual  10     NA    NA 
1  2 2013 actual  20     NA    NA 
2  3 2013 actual  20     NA    NA 
3  1 2014 forecast  30     20    NA 
4  2 2014 forecast  50     30    NA 
5  3 2014 forecast  90     70    NA 
6  1 2014 actual  10     0    -20 
7  2 2014 actual  40     20    -10 
8  3 2014 actual  50     30    -40 
9  1 2015 forecast  120     90    NA 
10 2 2015 forecast  210    160    NA 
11 3 2015 forecast  150     60    NA 
12 1 2015 actual  130    120    30 
13 2 2015 actual  100     60    -110 
14 3 2015 actual  190    140    40

EDIT--我得到相当接近这个：

df['prev_year'] = df.groupby(['ids','run']).shift(1)['revenue'] 
df['chg_from_prev_year'] = df['revenue'] - df['prev_year'] 

df['curr_forecast'] = df.groupby(['ids','year']).shift(1)['revenue'] 
df['chg_from_forecast'] = df['revenue'] - df['curr_forecast']

错过的唯一一件事（如预期）是2013年预测的2013年预测的实际比较。我可以复制数据集中的2013年运行，计算2014年预测的chg_from_prev_year，并从最终数据框中隐藏/删除不需要的数据。

来源

2013-08-27 bjornarneson

首先摆脱前一年的变化，做到在每个组的变化：

In [11]: g = df.groupby(['ids', 'run']) 

In [12]: df['chg_from_prev_year'] = g['revenue'].apply(lambda x: x - x.shift())

接下来的部分是比较复杂的，我想你需要为下一个部分做了pivot_table：

In [13]: df1 = df.pivot_table('revenue', ['ids', 'year'], 'run') 

In [14]: df1 
Out[14]: 
run  actual forecast 
ids year 
1 2013  10  NaN 
    2014  10  30 
    2015  130  120 
2 2013  20  NaN 
    2014  40  50 
    2015  100  210 
3 2013  20  NaN 
    2014  50  90 
    2015  190  150 

In [15]: g1 = df1.groupby(level='ids', as_index=False) 

In [16]: out_by = g1.apply(lambda x: x['actual'] - x['forecast']) 

In [17]: out_by # hello levels bug, fixed in 0.13/master... yesterday :) 
Out[17]: 
ids ids year 
1 1 2013 NaN 
      2014 -20 
      2015  10 
2 2 2013 NaN 
      2014 -10 
      2015 -110 
3 3 2013 NaN 
      2014 -40 
      2015  40 
dtype: float64

这是你想要的结果，但不是正确的格式（如果你没有太紧张的话，见下面的[31]）......下面的内容似乎有点破解温和地），但这里去：

In [21]: df2 = df.set_index(['ids', 'year', 'run']) 

In [22]: out_by.index = out_by.index.droplevel(0) 

In [23]: out_by_df = pd.DataFrame(out_by, columns=['revenue']) 

In [24]: out_by_df['run'] = 'forecast' 

In [25]: df2['chg_from_forecast'] = out_by_df.set_index('run', append=True)['revenue']

，我们就大功告成了......

In [26]: df2.reset_index() 
Out[26]: 
    ids year  run revenue chg_from_prev_year chg_from_forecast 
0  1 2013 actual  10     NaN    NaN 
1  2 2013 actual  20     NaN    NaN 
2  3 2013 actual  20     NaN    NaN 
3  1 2014 forecast  30     NaN    -20 
4  2 2014 forecast  50     NaN    -10 
5  3 2014 forecast  90     NaN    -40 
6  1 2014 actual  10     0    NaN 
7  2 2014 actual  40     20    NaN 
8  3 2014 actual  50     30    NaN 
9  1 2015 forecast  120     90     10 
10 2 2015 forecast  210     160    -110 
11 3 2015 forecast  150     60     40 
12 1 2015 actual  130     120    NaN 
13 2 2015 actual  100     60    NaN 
14 3 2015 actual  190     140    NaN

注：我认为chg_from_prev_year第6个结果应为NaN。

不过，我想你可能会更好保持它作为一个支点：

In [31]: df3 = df.pivot_table(['revenue', 'chg_from_prev_year'], ['ids', 'year'], 'run') 

In [32]: df3['chg_from_forecast'] = g1.apply(lambda x: x['actual'] - x['forecast']).values 

In [33]: df3 
Out[33]: 
      revenue   chg_from_prev_year   chg_from_forecast 
run  actual forecast    actual forecast 
ids year 
1 2013  10  NaN     NaN  NaN    NaN 
    2014  10  30     0  NaN    -20 
    2015  130  120     120  90     10 
2 2013  20  NaN     NaN  NaN    NaN 
    2014  40  50     20  NaN    -10 
    2015  100  210     60  160    -110 
3 2013  20  NaN     NaN  NaN    NaN 
    2014  50  90     30  NaN    -40 
    2015  190  150     140  60     40

来源

2013-08-27 19:52:57

对于一瞬间看完后'见下面的[31]'我想，“哇，安迪会有点为了回答这个脚注而过度。“ – TomAugspurger

@TomAugspurger只是有点过度...... :)（我记得以为这是一个奇怪的句子！） –

回答

相关问题