2017-09-08 139 views
1

我试图根据两个条件在熊猫中创建条件运行总和。有条件的运行计数熊猫

import pandas as pd 
ID = [1,1,1,2,2,3,4] 
after = ['A','B','B','A','A','B','A'] 
before = ['A','B','B','A','A','B','A'] 
df = pd.DataFrame([ID, before,after]).T 
df.columns = ['ID','before','after'] 

的数据是这样的:

ID before after 
0 1  A  A 
1 1  B  B 
2 1  B  B 
3 2  A  A 
4 2  A  A 
5 3  B  B 
6 4  A  A 

我则想看看有多长的ID已为B的前值,我的尝试:

df['time_on_b'] = (df.groupby('before')['ID'].cumcount()+1).where(df['before']=='B',0) 

这使me:

ID before after time_on_b 
0 1  A  A   0 
1 1  B  B   1 
2 1  B  B   2 
3 2  A  A   0 
4 2  A  A   0 
5 3  B  B   3 
6 4  A  A   0 

i处理输出如下:

ID before after time_on_b 
0 1  A  A   0 
1 1  B  B   1 
2 1  B  B   2 
3 2  A  A   0 
4 2  A  A   0 
5 3  B  B   1 
6 4  A  A   0 

正如你可以看到,作为标识的变化我想time_on_b重置所以它给了我1的值,而不是3

回答

4

看来你通过ID需要组,然后用cumsum来算的B的出现:

cond = df.before == 'B' 
df['time_on_b'] = cond.groupby(df.ID).cumsum().where(cond, 0).astype(int) 
df 
# ID before after time_on_b 
#0 1  A  A 0 
#1 1  B  B 1 
#2 1  B  B 2 
#3 2  A  A 0 
#4 2  A  A 0 
#5 3  B  B 1 
#6 4  A  A 0 
2

你也可以使用transform

df.groupby('ID').before.transform(lambda x: x.eq('B').cumsum()) 

0 0 
1 1 
2 2 
3 0 
4 0 
5 1 
6 0 
Name: before, dtype: int32 

df.assign(time_on_b=df.groupby('ID').before.transform(lambda x: x.eq('B').cumsum())) 

    ID before after time_on_b 
0 1  A  A   0 
1 1  B  B   1 
2 1  B  B   2 
3 2  A  A   0 
4 2  A  A   0 
5 3  B  B   1 
6 4  A  A   0