2017-09-26 176 views
1
   star_rating   duration   
Date   20170829 20170830 20170829 20170830 
genre           
Action   1038.1 1038.1 15917.0 16598.0 
Adventure  595.0 595.0 9386.0 10113.0 
Animation  490.7 490.7 5811.0 5989.0 
Biography  596.9 596.9 9661.0 10002.0 
Comedy   1211.7 1211.7 16616.0 16786.0 

In[86]: df2.columns 
Out[86]: 
MultiIndex(levels=[['star_rating', 'duration'], [20170829, 20170830]], 
      labels=[[0, 0, 1, 1], [0, 1, 0, 1]], 
      names=[None, 'Date']) 

大家好,我有上表DF2,我想插入一列的差异,这将是20170830一个简单的减法 - 20170829.将在多指标数据帧大熊猫计算列

  star_rating      duration   
Date  20170829 20170830 Diff 20170829 20170830 Diff 
genre      
Action  1038.1  1038.1  0  15917  16598  681 
Adventure 595   595   0  9386  10113  727 
Animation 490.7  490.7  0  5811  5989  178 
Biography 596.9  596.9  0  9661  10002  341 
Comedy  1211.7  1211.7  0  16616  16786  170 

它如果日期处于最高位置,那么我可以使用df2['diff'] = df2[20170830] - df2[20170829]

我是multiIndex新手,很感谢任何人有任何想法让我开始。提前致谢。

+0

https://stackoverflow.com/questions/43238183/python-pandas-add-subtotal-on-each-lvl-of- multiindex-dataframe检查这一点 – Wen

回答

0

让我们试试:

df1 = df.groupby(level=0,axis=1).diff().dropna(1) 

df1.columns = df1.columns.set_levels(['diff','diff'],level=1) 

df.columns = df.columns.set_levels(df.columns.get_level_values(1).astype(str),level=1) 

df_out = pd.concat([df,df1],axis=1).sort_index(1) 

输出:

  duration     star_rating    
Date  20170829 20170830 diff 20170829 20170830 diff 
genre              
Action  15917.0 16598.0 681.0  1038.1 1038.1 0.0 
Adventure 9386.0 10113.0 727.0  595.0 595.0 0.0 
Animation 5811.0 5989.0 178.0  490.7 490.7 0.0 
Biography 9661.0 10002.0 341.0  596.9 596.9 0.0 
Comedy  16616.0 16786.0 170.0  1211.7 1211.7 0.0