2016-01-20 34 views
1

我有一个看起来像这样的(实际上有35列和更多的元组,但下面是相关列的数据帧:分组的大熊猫,同时保留元组

 leg_side leg_quantity expiration product change_type 
0  None   None  None  ZQ  inserted 
1  None   None  None  HG  inserted 
2  None   None  None  PL  inserted 
3  None   None  None  SI  inserted 
4  None   None  None  ZQ  inserted 
5  None   None  None  PL  inserted 
6  None   None  None  ZW  inserted 
7  None   None  None  SI  inserted 
8  None   None  None  ZQ  updated 
9  None   None  None  SI  inserted 
10  None   None  None  ZC  updated 
..  ...   ...  ...  ...   ... 
970  None   None  None  OZ  inserted 
971  None   None  None  OZ  deleted 
972  None   None  None  OZ  updated 
973  None   None  None  ZC  inserted 
974  None   None  None  OZ  inserted 
975  None   None  None  ZC  inserted 
976  None   None  None  OZ  inserted 

现在我想要做什么是组通过该产品,但不一定在SQL意义上我想要做的就是聚合与同类产品的所有元组在一起,并通过change_type做一个子聚合,得到这样的DF:

 leg_side leg_quantity expiration product change_type 
0  None   None  None  ZQ  inserted 
4  None   None  None  ZQ  inserted 
8  None   None  None  ZQ  updated 
1  None   None  None  HG  inserted 
2  None   None  None  PL  inserted 
5  None   None  None  PL  inserted 
3  None   None  None  SI  inserted 
7  None   None  None  SI  inserted 
9  None   None  None  SI  inserted 
6  None   None  None  ZW  inserted 
... 
973  None   None  None  ZC  inserted 
975  None   None  None  ZC  inserted 
10  None   None  None  ZC  updated 
970  None   None  None  OZ  inserted 
974  None   None  None  OZ  inserted 
976  None   None  None  OZ  inserted 
972  None   None  None  OZ  updated 
971  None   None  None  OZ  deleted 

的上面的数据框架被组织成具有相同产品名称的所有元组在一起,然后将具有相同更改类型的那些组中的所有元组分组在一起(优选以插入,更新,删除的顺序)。如果我做熊猫groupby(),那么元组将被消除。我只是想分组排序的感觉。我怎样才能做到这一点?

回答

1

您可以使用Categoricalset自定义顺序。然后groupby带分类的数据:

df['change_type'] = df['change_type'].astype('category') 
            .cat 
            .set_categories(["inserted","updated","deleted"], ordered=True) 

df = df.groupby('product').apply(lambda x: x.sort_values('change_type')) 
          .reset_index(drop=True) 
print df 

    leg_side leg_quantity expiration product change_type 
0  None   None  None  HG inserted 
1  None   None  None  OZ inserted 
2  None   None  None  OZ inserted 
3  None   None  None  OZ inserted 
4  None   None  None  OZ  updated 
5  None   None  None  OZ  deleted 
6  None   None  None  PL inserted 
7  None   None  None  PL inserted 
8  None   None  None  SI inserted 
9  None   None  None  SI inserted 
10  None   None  None  SI inserted 
11  None   None  None  ZC inserted 
12  None   None  None  ZC inserted 
13  None   None  None  ZC  updated 
14  None   None  None  ZQ inserted 
15  None   None  None  ZQ inserted 
16  None   None  None  ZQ  updated 
17  None   None  None  ZW inserted