2017-02-28 28 views
1

我能够做出的订单如表大熊猫的帮助: enter image description hereTimedeltas忠诚计算

identifier gender  Date category 
0   1 female 2016-11-11  Baby 
1   1 female 2017-02-01  Baby 
2   2 female 2016-12-19 Shave 
3   2 female 2016-12-27 Shave 
4   3 female 2016-11-11  Baby 
5   3 female 2016-11-22  Baby 
6   4 male 2016-11-11 Shave 
7   4 male 2017-01-01 Shave 

我需要的结果是按天数第一第二的订单的订单数量和:

first order: 
11.11.2016 3 
19.12.2016 1 

second orders: 
22.11.2016 1 
21.12.2016 1 
01.01.2017 1 
02.01.2017 1 

third orders: 

,也是我需要计算订单之间的平均时间(被人)

average time between orders = ... 

并评估客户的跨品类忠诚度。我觉得这些taska看起来很相似

Loyalty cross categories: 
    first order: 
    Baby 2 
    second order: 
    Baby - 2 
    third order: 


    first order: 
    Shave 2 
    second order: 
    Shave - 2 
    third order: 

是否可以用熊猫做这样的分析?

回答

1

鉴于此数据帧

identifier gender  Date category 
0   1 female 2016-11-11  Baby 
1   1 female 2017-02-01  Baby 
2   2 female 2016-12-19 Shave 
3   2 female 2016-12-27 Shave 
4   3 female 2016-11-11  Baby 
5   3 female 2016-11-22  Baby 
6   4 male 2016-11-11 Shave 
7   4 male 2017-01-01 Shave 

您可以通过一组函数使用一系列偏移开始

df_groups = df.groupby('identifier') 
df['last_order'] = df_groups.Date.shift(1) 

然后你就可以拿到订单

df['Time_between_orders'] = df['last_order'] - df['Date'] 

然后之间的时间你可以得到这样的每个用户之间的平均时间:

df_groups = df.groupby('identifier') 
df_groups['Time_between_orders'].apply(lambda x: x.sum()/x.notnull().sum()).apply(lambda x: x.days) 

会给:

identifier 
1   -82 
2   -8 
3   -11 
4   -51 

如果你想要这个跨类别,只需添加类别到全部组语句。 df.groupby('identifier')变为df.groupby(['identifier', 'category'])