2013-08-27 226 views
0

我想比较多个模型的输出运行,计算这些值:计算差异

当期的收入和上期
  • 实际当期的差异之间
    1. 差异收入和预测的当期收入

    我尝试过使用多指标,并且怀疑答案在于某个创意转换()的方向。但是,恐怕我已经通过偶然的应用各种支点/熔化/团体实验来改变问题。也许你可以帮我找出如何把这:

    import pandas as pd 
    
    ids = [1,2,3] * 5 
    year = ['2013', '2013', '2013', '2014', '2014', '2014', '2014', '2014', '2014', '2015', '2015', '2015', '2015', '2015', '2015'] 
    run = ['actual','actual','actual','forecast','forecast','forecast','actual','actual','actual','forecast','forecast','forecast','actual','actual','actual'] 
    
    revenue = [10,20,20,30,50,90,10,40,50,120,210,150,130,100,190] 
    
    change_from_previous_year = ['NA','NA','NA',20,30,70,0,20,30,90,160,60,120,60,140] 
    change_from_forecast = ['NA','NA','NA','NA','NA','NA',-20,-10,-40,'NA','NA','NA',30,-110,40] 
    
    d = {'ids':ids, 'year':year, 'run':run, 'revenue':revenue} 
    
    df = pd.DataFrame(data=d, columns=['ids','year','run','revenue']) 
    print df 
    
        ids year  run revenue 
    0  1 2013 actual  10 
    1  2 2013 actual  20 
    2  3 2013 actual  20 
    3  1 2014 forecast  30 
    4  2 2014 forecast  50 
    5  3 2014 forecast  90 
    6  1 2014 actual  10 
    7  2 2014 actual  40 
    8  3 2014 actual  50 
    9  1 2015 forecast  120 
    10 2 2015 forecast  210 
    11 3 2015 forecast  150 
    12 1 2015 actual  130 
    13 2 2015 actual  100 
    14 3 2015 actual  190 
    

    ....这个:

    ids year  run revenue chg_from_prev_year chg_from_forecast 
    0  1 2013 actual  10     NA    NA 
    1  2 2013 actual  20     NA    NA 
    2  3 2013 actual  20     NA    NA 
    3  1 2014 forecast  30     20    NA 
    4  2 2014 forecast  50     30    NA 
    5  3 2014 forecast  90     70    NA 
    6  1 2014 actual  10     0    -20 
    7  2 2014 actual  40     20    -10 
    8  3 2014 actual  50     30    -40 
    9  1 2015 forecast  120     90    NA 
    10 2 2015 forecast  210    160    NA 
    11 3 2015 forecast  150     60    NA 
    12 1 2015 actual  130    120    30 
    13 2 2015 actual  100     60    -110 
    14 3 2015 actual  190    140    40 
    

    EDIT--我得到相当接近这个:

    df['prev_year'] = df.groupby(['ids','run']).shift(1)['revenue'] 
    df['chg_from_prev_year'] = df['revenue'] - df['prev_year'] 
    
    df['curr_forecast'] = df.groupby(['ids','year']).shift(1)['revenue'] 
    df['chg_from_forecast'] = df['revenue'] - df['curr_forecast'] 
    

    错过的唯一一件事(如预期)是2013年预测的2013年预测的实际比较。我可以复制数据集中的2013年运行,计算2014年预测的chg_from_prev_year,并从最终数据框中隐藏/删除不需要的数据。

  • 回答

    1

    首先摆脱前一年的变化,做到在每个组的变化:

    In [11]: g = df.groupby(['ids', 'run']) 
    
    In [12]: df['chg_from_prev_year'] = g['revenue'].apply(lambda x: x - x.shift()) 
    

    接下来的部分是比较复杂的,我想你需要为下一个部分做了pivot_table

    In [13]: df1 = df.pivot_table('revenue', ['ids', 'year'], 'run') 
    
    In [14]: df1 
    Out[14]: 
    run  actual forecast 
    ids year 
    1 2013  10  NaN 
        2014  10  30 
        2015  130  120 
    2 2013  20  NaN 
        2014  40  50 
        2015  100  210 
    3 2013  20  NaN 
        2014  50  90 
        2015  190  150 
    
    In [15]: g1 = df1.groupby(level='ids', as_index=False) 
    
    In [16]: out_by = g1.apply(lambda x: x['actual'] - x['forecast']) 
    
    In [17]: out_by # hello levels bug, fixed in 0.13/master... yesterday :) 
    Out[17]: 
    ids ids year 
    1 1 2013 NaN 
          2014 -20 
          2015  10 
    2 2 2013 NaN 
          2014 -10 
          2015 -110 
    3 3 2013 NaN 
          2014 -40 
          2015  40 
    dtype: float64 
    

    这是你想要的结果,但不是正确的格式(如果你没有太紧张的话,见下面的[31])......下面的内容似乎有点破解温和地),但这里去:

    In [21]: df2 = df.set_index(['ids', 'year', 'run']) 
    
    In [22]: out_by.index = out_by.index.droplevel(0) 
    
    In [23]: out_by_df = pd.DataFrame(out_by, columns=['revenue']) 
    
    In [24]: out_by_df['run'] = 'forecast' 
    
    In [25]: df2['chg_from_forecast'] = out_by_df.set_index('run', append=True)['revenue'] 
    

    ,我们就大功告成了......

    In [26]: df2.reset_index() 
    Out[26]: 
        ids year  run revenue chg_from_prev_year chg_from_forecast 
    0  1 2013 actual  10     NaN    NaN 
    1  2 2013 actual  20     NaN    NaN 
    2  3 2013 actual  20     NaN    NaN 
    3  1 2014 forecast  30     NaN    -20 
    4  2 2014 forecast  50     NaN    -10 
    5  3 2014 forecast  90     NaN    -40 
    6  1 2014 actual  10     0    NaN 
    7  2 2014 actual  40     20    NaN 
    8  3 2014 actual  50     30    NaN 
    9  1 2015 forecast  120     90     10 
    10 2 2015 forecast  210     160    -110 
    11 3 2015 forecast  150     60     40 
    12 1 2015 actual  130     120    NaN 
    13 2 2015 actual  100     60    NaN 
    14 3 2015 actual  190     140    NaN 
    

    注:我认为chg_from_prev_year第6个结果应为NaN。

    不过,我想你可能会更好保持它作为一个支点:

    In [31]: df3 = df.pivot_table(['revenue', 'chg_from_prev_year'], ['ids', 'year'], 'run') 
    
    In [32]: df3['chg_from_forecast'] = g1.apply(lambda x: x['actual'] - x['forecast']).values 
    
    In [33]: df3 
    Out[33]: 
          revenue   chg_from_prev_year   chg_from_forecast 
    run  actual forecast    actual forecast 
    ids year 
    1 2013  10  NaN     NaN  NaN    NaN 
        2014  10  30     0  NaN    -20 
        2015  130  120     120  90     10 
    2 2013  20  NaN     NaN  NaN    NaN 
        2014  40  50     20  NaN    -10 
        2015  100  210     60  160    -110 
    3 2013  20  NaN     NaN  NaN    NaN 
        2014  50  90     30  NaN    -40 
        2015  190  150     140  60     40 
    
    +0

    对于一瞬间看完后'见下面的[31]'我想,“哇,安迪会有点为了回答这个脚注而过度。“ – TomAugspurger

    +0

    @TomAugspurger只是有点过度...... :)(我记得以为这是一个奇怪的句子!) –