2016-07-01 108 views
2

我想做到以下几点:根据各行大熊猫的GroupBy日期范围

的数据帧,看起来像这样:

df = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/24/2014","06/25/2014","06/23/2014","07/02/1999","07/02/1999"], "value": ["3","5","1","7","8"] }) 

我想按日期,所有那些意见组在两天之内。然后,例如,前3行将被分组,最后两行将被分组。

到目前为止,我已经想使用类似的东西:

df.groupby(df['date'].map(lambda x: x.month)) 

什么是做这类“模糊GROUPBY”的一般方法是什么?

谢谢你,

+0

http://stackoverflow.com/questions/22769047/pandas-group-by-time-windows的可能的复制 – Jeff

回答

4

您可以通过date行进行排序,然后采取连续日期之间的差值。 测试差异是否大于2天。以累积和分配所希望的组号:

import pandas as pd 
df = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/24/2014","06/25/2014","06/23/2014","07/02/1999","07/02/1999"], "value": ["3","5","1","7","8"] }) 
df['date'] = pd.to_datetime(df['date']) 
df = df.sort_values(by='date') 
df['group'] = (df['date'].diff() > pd.Timedelta(days=2)).cumsum() 
print(df) 

产生

ID  date value group 
3 B 1999-07-02  7  0 
4 B 1999-07-02  8  0 
2 C 2014-06-23  1  1 
0 A 2014-06-24  3  1 
1 A 2014-06-25  5  1