2016-09-07 42 views
1

在以下的数据帧,我想添加行若在列A值的计数小于10追加或添加行中熊猫数据帧

对于例如,在下面的表列A组60出现12次,然而gorup 61出现9次。我想在第61组的最后一个记录之后添加一行,并从相应的值组60中复制列B,C,D中的值。组62的相似操作等等。

 A  B C  D 
0 60 0.235 4 7.86 
1 60 1.235 5 8.86 
2 60 2.235 6 9.86 
3 60 3.235 7 10.86 
4 60 4.235 8 11.86 
5 60 5.235 9 12.86 
6 60 6.235 10 13.86 
7 60 7.235 11 14.86 
8 60 8.235 12 15.86 
9 60 9.235 13 16.86 
10 60 10.235 14 17.86 
11 60 11.235 15 18.86 
12 61 12.235 16 19.86 
13 61 13.235 17 20.86 
14 61 14.235 18 21.86 
15 61 15.235 19 22.86 
16 61 16.235 20 23.86 
17 61 17.235 21 24.86 
18 61 18.235 22 25.86 
19 61 19.235 23 26.86 
20 61 20.235 24 27.86 
21 62 20.235 24 28.86 
22 62 20.235 24 29.86 
23 62 20.235 24 30.86 
24 62 20.235 24 31.86 
25 62 20.235 24 32.86 
+0

你能显示你的努力,你也应该发布原始文本和代码而不是图片 – EdChum

回答

2

您可以使用:

#cumulative count per group 
df['G'] = df.groupby('A').cumcount() 

df = df.groupby(['A','G']) 
     .first() #agregate first 
     .unstack() #reshape DataFrame 
     .ffill() #same as fillna(method='ffill') 
     .stack() #get original shape 
     .reset_index(drop=True, level=1) #remove level G in index 
     .reset_index() 

print (df) 
 A  B  C  D 
0 60 0.235 4.0 7.86 
1 60 1.235 5.0 8.86 
2 60 2.235 6.0 9.86 
3 60 3.235 7.0 10.86 
4 60 4.235 8.0 11.86 
5 60 5.235 9.0 12.86 
6 60 6.235 10.0 13.86 
7 60 7.235 11.0 14.86 
8 60 8.235 12.0 15.86 
9 60 9.235 13.0 16.86 
10 60 10.235 14.0 17.86 
11 60 11.235 15.0 18.86 
12 61 12.235 16.0 19.86 
13 61 13.235 17.0 20.86 
14 61 14.235 18.0 21.86 
15 61 15.235 19.0 22.86 
16 61 16.235 20.0 23.86 
17 61 17.235 21.0 24.86 
18 61 18.235 22.0 25.86 
19 61 19.235 23.0 26.86 
20 61 20.235 24.0 27.86 
21 61 9.235 13.0 16.86 
22 61 10.235 14.0 17.86 
23 61 11.235 15.0 18.86 
24 62 20.235 24.0 28.86 
25 62 20.235 24.0 29.86 
26 62 20.235 24.0 30.86 
27 62 20.235 24.0 31.86 
28 62 20.235 24.0 32.86 
29 62 17.235 21.0 24.86 
30 62 18.235 22.0 25.86 
31 62 19.235 23.0 26.86 
32 62 20.235 24.0 27.86 
33 62 9.235 13.0 16.86 
34 62 10.235 14.0 17.86 
35 62 11.235 15.0 18.86 

另一种解决方案与pivot_table

df['G'] = df.groupby('A').cumcount() 

df = df.pivot_table(index='A', columns='G') 
     .ffill() 
     .stack() 
     .reset_index(drop=True, level=1) 
     .reset_index() 

print (df) 
 A  B  C  D 
0 60 0.235 4.0 7.86 
1 60 1.235 5.0 8.86 
2 60 2.235 6.0 9.86 
3 60 3.235 7.0 10.86 
4 60 4.235 8.0 11.86 
5 60 5.235 9.0 12.86 
6 60 6.235 10.0 13.86 
7 60 7.235 11.0 14.86 
8 60 8.235 12.0 15.86 
9 60 9.235 13.0 16.86 
10 60 10.235 14.0 17.86 
11 60 11.235 15.0 18.86 
12 61 12.235 16.0 19.86 
13 61 13.235 17.0 20.86 
14 61 14.235 18.0 21.86 
15 61 15.235 19.0 22.86 
16 61 16.235 20.0 23.86 
17 61 17.235 21.0 24.86 
18 61 18.235 22.0 25.86 
19 61 19.235 23.0 26.86 
20 61 20.235 24.0 27.86 
21 61 9.235 13.0 16.86 
22 61 10.235 14.0 17.86 
23 61 11.235 15.0 18.86 
24 62 20.235 24.0 28.86 
25 62 20.235 24.0 29.86 
26 62 20.235 24.0 30.86 
27 62 20.235 24.0 31.86 
28 62 20.235 24.0 32.86 
29 62 17.235 21.0 24.86 
30 62 18.235 22.0 25.86 
31 62 19.235 23.0 26.86 
32 62 20.235 24.0 27.86 
33 62 9.235 13.0 16.86 
34 62 10.235 14.0 17.86 
35 62 11.235 15.0 18.86