2016-04-13 64 views
2

我有一个数据帧命名pricecomp_df,我想借此比较列的“市场价格”和每个喜欢“苹果价格的”等栏目的价格,“芒果价格”,‘西瓜价格’,但优先考虑基础条件的差异:(第一优先是西瓜的价格,仅次于芒果和第三的苹果)。下面的输入数据框中给出:以大熊猫据帧的两个栏之间的差异

code apple price mangoes price watermelon price market price 
0 101   101   NaN    NaN   122 
1 102   123   123    NaN   124 
2 103   NaN   NaN    NaN   123 
3 105   123   167    NaN   154 
4 107   165   NaN    177   176 
5 110   123   NaN    NaN   123 

所以这里的第一行已经不仅仅是苹果的价格和市场价格,然后把他们的差异,但在第二行,我们有苹果,芒果的价格,所以我必须采取只区别市场价格和芒果价格之间。同样根据优先权条件采取差异。对于所有三种价格,也跳过nan行。任何人都可以帮忙吗?

+0

任何人都可以帮助我吗? – User1090

+0

三年后,我想出了一个解决方案。你还需要@ User1090吗? – MERose

回答

12

希望我不是太晚了。这个想法是计算差异并根据您的优先级列表覆盖它们。

import numpy as np 
import pandas as pd 

df = pd.DataFrame({'code': [101, 102, 103, 105, 107, 110], 
        'apple price': [101, 123, np.nan, 123, 165, 123], 
        'mangoes price': [np.nan, 123, np.nan, 167, np.nan, np.nan], 
        'watermelon price': [np.nan, np.nan, np.nan, np.nan, 177, np.nan], 
        'market price': [122, 124, 123, 154, 176, 123]}) 

# Calculate difference to apple price 
df['diff'] = df['market price'] - df['apple price'] 
# Overwrite with difference to mangoes price 
df['diff'] = df.apply(lambda x: x['market price'] - x['mangoes price'] if not np.isnan(x['mangoes price']) else x['diff'], axis=1) 
# Overwrite with difference to watermelon price 
df['diff'] = df.apply(lambda x: x['market price'] - x['watermelon price'] if not np.isnan(x['watermelon price']) else x['diff'], axis=1) 

print df 
    apple price code mangoes price market price watermelon price diff 
0   101 101   NaN   122    NaN 21 
1   123 102   123   124    NaN  1 
2   NaN 103   NaN   123    NaN NaN 
3   123 105   167   154    NaN -13 
4   165 107   NaN   176    177 -1 
5   123 110   NaN   123    NaN  0