2017-04-03 88 views
2

DatetimeIndex对象,如熊猫:如何提取日期时间范围从DatetimeIndex

的集合
DatetimeIndex(['2007-11-01 00:00:00', '2008-01-01 00:00:00', 
       '2008-02-01 00:00:00', '2008-03-01 00:00:00', 
       '2008-04-01 00:00:00', '2012-09-01 00:10:00', 
       '2012-09-01 00:20:00', '2012-09-01 00:30:00', 
       '2012-09-01 00:40:00', '2012-09-01 00:50:00', 
       ... 
       '2012-09-30 22:40:00', '2012-09-30 22:50:00', 
       '2012-09-30 23:00:00', '2012-09-30 23:10:00', 
       '2012-09-30 23:20:00', '2012-09-30 23:30:00', 
       '2012-09-30 23:40:00', '2012-09-30 23:50:00', 
       '2012-10-01 00:00:00', '2015-07-01 00:00:00'], 
       dtype='datetime64[ns]', length=4326, freq=None, tz=None) 

无论其freqinferred_freqNone,我想是因为即使实际上数据有10分钟的时间,由于缺少零件,无法检测到。只是这些缺少的部分,或者等价地,我想尽可能高效地提取可用部分。也就是说,我希望得到如下范围列表:

[('2007-11-01 00:00:00', '2007-11-01 00:00:00'), 
('2008-01-01 00:00:00', '2008-01-01 00:00:00'), 
('2008-02-01 00:00:00', '2008-02-01 00:00:00'), 
('2008-03-01 00:00:00', '2008-03-01 00:00:00'), 
('2008-04-01 00:00:00', '2008-04-01 00:00:00'), 
('2012-09-01 00:10:00', '2012-10-01 00:00:00'), 
('2015-07-01 00:00:00', '2015-07-01 00:00:00')] 

我该如何去做这件事?我曾看过PeriodIndex,但这似乎是针对不同类型的应用程序,或者可能仅仅不处理任意时间间隔。

回答

1

我认为你可以使用grouper系列groupby和总minmax

grouper通过与10 minutecumsum比较difference创建。

rng = pd.DatetimeIndex(['2007-11-01 00:00:00', '2008-01-01 00:00:00', 
       '2008-02-01 00:00:00', '2008-03-01 00:00:00', 
       '2008-04-01 00:00:00', '2012-09-01 00:10:00', 
       '2012-09-01 00:20:00', '2012-09-01 00:30:00', 
       '2012-09-01 00:40:00', '2012-09-01 00:50:00', 
       '2012-09-30 22:40:00', '2012-09-30 22:50:00', 
       '2012-09-30 23:00:00', '2012-09-30 23:10:00', 
       '2012-09-30 23:20:00', '2012-09-30 23:30:00', 
       '2012-09-30 23:40:00', '2012-09-30 23:50:00', 
       '2012-10-01 00:00:00', '2015-07-01 00:00:00']) 

s = pd.Series(rng) 
grouper = s.diff().ne(pd.to_timedelta('10min')).cumsum() 
print (grouper) 
0  1 
1  2 
2  3 
3  4 
4  5 
5  6 
6  6 
7  6 
8  6 
9  6 
10 7 
11 7 
12 8 
13 8 
14 8 
15 8 
16 8 
17 8 
18 8 
19 9 
dtype: int32 
print (s.groupby(grouper).agg(['min', 'max']).astype(str).apply(tuple, axis=1).tolist()) 
[('2007-11-01 00:00:00', '2007-11-01 00:00:00'), 
('2008-01-01 00:00:00', '2008-01-01 00:00:00'), 
('2008-02-01 00:00:00', '2008-02-01 00:00:00'), 
('2008-03-01 00:00:00', '2008-03-01 00:00:00'), 
('2008-04-01 00:00:00', '2008-04-01 00:00:00'), 
('2012-09-01 00:10:00', '2012-09-01 00:50:00'), 
('2015-09-30 22:40:00', '2015-09-30 22:50:00'), 
('2012-09-30 23:00:00', '2012-10-01 00:00:00'), 
('2015-07-01 00:00:00', '2015-07-01 00:00:00')] 
+0

我添加新的答案,请检查一下。 – jezrael

+0

这工作非常好。我遗漏了'astype(str)',因为它转换为我的本地时区;回来'时间戳'很好。 – equaeghe

+0

超级,谢谢。 – jezrael