2015-04-03 81 views
0

我想弄清楚如何显示2熊猫数据帧之间的差异。我快到了,但似乎无法弄清楚如何显示包含差异的行的其他数据。熊猫在数据帧中的差异

这是我到目前为止有:

数据帧一比较,以数据帧B:

数据框答:

Date ID_1 ID_2 Value 
1-Jan 1 1 5 
2-Jan 1 2 6 
3-Jan 1 3 4 
4-Jan 1 4 2 
5-Jan 1 5 8 

数据帧B:

Date ID_1 ID_2 Value 
1-Jan 1 1 5 
2-Jan 1 2 6 
3-Jan 1 3 4 
4-Jan 1 4 2 
5-Jan 1 5 55 

电流输出:

Date Column From To 
5-Jan Value 8 55 

所需的输出:

Date ID_1 ID_2 From To 
5-Jan 1 5  8 55 

目前代码:

#stack column(s) where dataframes are not equal 
ne_stacked = (df1 != df2).stack() 

#create new dataframe from ne_stacked 
changed = ne_stacked[ne_stacked] 

#change column names 
changed.index.names = ['Date', 'Column'] 

#create array where dataframes are not equal 
diff_loc = np.where(df1 != df2) 

#create 'from' column 
changed_from = df1.values[diff_loc] 

#create 'to' column 
changed_to = df2.values[diff_loc] 

#create a summary dataframe 
final = pd.DataFrame({'From': changed_from, 'To': changed_to}, index=changed.index) 

print final 

回答

1

使用merge

In [29]: 

print df_a 
    Date ID_1 ID_2 Value 
0 1-Jan  1  1  5 
1 2-Jan  1  2  6 
2 3-Jan  1  3  4 
3 4-Jan  1  4  2 
4 5-Jan  1  5  8 
In [30]: 

print df_b 
    Date ID_1 ID_2 Value 
0 1-Jan  1  1  5 
1 2-Jan  1  2  6 
2 3-Jan  1  3  4 
3 4-Jan  1  4  2 
4 5-Jan  1  5  55 
In [31]: 

df_c = pd.merge(df_a, df_b, 
       how='outer', 
       on=['Date', 'ID_1', 'ID_2']) 
df_c.columns = ['Date', 'ID_1', 'ID_2', 'From', 'To'] 
df_c = df_c[df_c.From!=df_c.To] 
print df_c 
    Date ID_1 ID_2 From To 
4 5-Jan  1  5  8 55 
0

试试这个:

dfm = df1.merge(df2, on=['Date', 'ID_1', 'ID_2']).rename(columns={'Value_x':'From', 'Value_y':'To'}) 
print dfm[dfm.From != dfm.To] 

    Date ID_1 ID_2 From To 
4 5-Jan  1  5  8 55