2017-10-16 89 views
0

基于组语句,这条巨蟒大熊猫数据帧DF工作:Python的大熊猫如果总和

CategoryA | CategoryB | Count 
1   A   0 
1   A   -1 
2   B   1 
2   B   1 
3   C   1 
3   C   -1 

我基本上要标记为删除,CategoryA/B,其总和的所有分组低于0

df['decision'] = np.where(df.groupby(['CategoryA', 'CategoryB'])['Count'].sum()>0, 'keep', 'delete') 

但我得到这个错误ValueError异常:值的长度不符合指标的长度

输出为:

CategoryA | CategoryB | Count | decision 
1   A   0  delete 
1   A   -1  delete 
2   B   1  keep 
2   B   1  keep 
3   C   1  delete 
3   C   -1  delete 

宁愿与df.loc要做到这一点,但不知道如何。

回答

3
In [67]: df['decision'] = \ 
      np.where(df.groupby(['CategoryA', 'CategoryB'])['Count'].transform('sum') > 0, 
         'keep', 'delete') 

In [68]: df 
Out[68]: 
    CategoryA CategoryB Count decision 
0   1   A  0 delete 
1   1   A  -1 delete 
2   2   B  1  keep 
3   2   B  1  keep 
4   3   C  1 delete 
5   3   C  -1 delete 
3

你在正确的轨道上。

m = df.groupby(['CategoryA', 'CategoryB']).transform('sum').gt(0) 
df['decision'] = np.where(m, 'keep', 'delete') 

df 
    CategoryA CategoryB Count decision 
0   1   A  0 delete 
1   1   A  -1 delete 
2   2   B  1  keep 
3   2   B  1  keep 
4   3   C  1 delete 
5   3   C  -1 delete 

使用transform检索具有相同大小的结果。

+0

谢谢 - 收到此错误类型错误:“SeriesGroupBy”对象不是可调用 – jeangelj

+0

@jeangelj怪异,它的工作完美的罚款我。 –

3
df['decision']=df['CategoryB'].map(df.groupby('CategoryB')['Count'].\ 
     apply(lambda x :np.where(x.sum()>0,'keep','delete'))) 
df 
Out[573]: 
    CategoryA CategoryB Count decision 
0   1   A  0 delete 
1   1   A  -1 delete 
2   2   B  1  keep 
3   2   B  1  keep 
4   3   C  1 delete 
5   3   C  -1 delete