2016-01-17 33 views
3

这是以下问题的后续:Pandas DataFrame Window Function索引查找用于计算

analysis first_pass fruit order second_pass test units highest \ 
0  full  12.1 apple  2   20.1  1  g True 
1  full   7.1 apple  1   12.0  2  g False 
2 partial  14.3 apple  3   13.1  1  g False 
3  full  20.1 orange  2   20.1  1  g True 
4  full  17.1 orange  1   18.5  2  g True 
5 partial  23.4 orange  3   22.7  1  g True 
6  full  23.1 grape  3   14.1  1  g False 
7  full  17.2 grape  2   17.1  2  g False 
8 partial  19.1 grape  1   19.4  1  g False 

    highest_fruit   
0 [apple, orange] 
1   [orange] 
2   [orange] 
3 [apple, orange] 
4   [orange] 
5   [orange] 
6 [apple, orange] 
7   [orange] 
8   [orange] 

在原来的问题,我被引导到上表中的最高果(一个或多个),用于一个给定的分析并通过在表上进行转化来指示测试组合(例如全分析测试1导致苹果和橙色果实具有最高第二通过数)。

我现在试图使用这些信息来计算那些水果相对于第一遍的相对表现。例如,现在我知道苹果和橙色是最高水果为全分析,测试1,我想知道他们是否改善他们的第一次通过。 (苹果在第二次传球时得分为20.1,而first_pass的得分为12.1;在第一传球得分19.1后,橙色提高到20.1)。

我想类似下面的一个表格(1 =改善,0 =无变化,-1更糟):

analysis first_pass fruit order second_pass test units highest \ 
0  full  12.1 apple  2   20.1  1  g True 
1  full   7.1 apple  1   12.0  2  g False 
2 partial  14.3 apple  3   13.1  1  g False 
3  full  20.1 orange  2   20.1  1  g True 
4  full  17.1 orange  1   18.5  2  g True 
5 partial  23.4 orange  3   22.7  1  g True 
6  full  23.1 grape  3   14.1  1  g False 
7  full  17.2 grape  2   17.1  2  g False 
8 partial  19.1 grape  1   19.4  1  g False 

    highest_fruit  score_change_between_passes 
0 [apple, orange]  {"apple" : 1, "orange" : 0} 
1   [orange]  {"orange" : 1} 
2   [orange]  {"orange" : -1} 
3 [apple, orange]  {"apple" : 1, "orange" : 0} 
4   [orange]  {"orange" " 1} 
5   [orange]  {"orange" : -1} 
6 [apple, orange]  {"apple" : 1, "orange" : 0} 
7   [orange]  {"orange" : 1} 
8   [orange]  {"orange" : -1} 

回答

0

你可以使用np.sign()

second_pass = df.groupby(['test', 'analysis']).apply(lambda x: {fruit: int(np.sign(x.loc[x.fruit==fruit, 'second_pass'].iloc[0] - x.loc[x.fruit==fruit, 'first_pass'].iloc[0])) for fruit in x.highest_fruit.iloc[0]}).reset_index() 
df = df.merge(second_pass, on=['test', 'analysis'], how='left').rename(columns={0: 'second_pass_comp'}) 


    analysis first_pass fruit order second_pass test units highest \ 
0  full  12.1 apple  2   20.1  1  g True 
1  full   7.1 apple  1   12.0  2  g False 
2 partial  14.3 apple  3   13.1  1  g False 
3  full  19.1 orange  2   20.1  1  g True 
4  full  17.1 orange  1   18.5  2  g True 
5 partial  23.4 orange  3   22.7  1  g True 
6  full  23.1 grape  3   14.1  1  g False 
7  full  17.2 grape  2   17.1  2  g False 
8 partial  19.1 grape  1   19.4  1  g False 

    highest_fruit   first_pass_highest_fruit   second_pass_comp 
0 [apple, orange] {'orange': 19.1, 'apple': 12.1} {'orange': 1, 'apple': 1} 
1   [orange]     {'orange': 17.1}    {'orange': 1} 
2   [orange]     {'orange': 23.4}    {'orange': -1} 
3 [apple, orange] {'orange': 19.1, 'apple': 12.1} {'orange': 1, 'apple': 1} 
4   [orange]     {'orange': 17.1}    {'orange': 1} 
5   [orange]     {'orange': 23.4}    {'orange': -1} 
6 [apple, orange] {'orange': 19.1, 'apple': 12.1} {'orange': 1, 'apple': 1} 
7   [orange]     {'orange': 17.1}    {'orange': 1} 
8   [orange]     {'orange': 23.4}    {'orange': -1}