2017-08-25 74 views
0

- 编辑我注意到我输入的时间不是我的意图。我将12点后的时间转换为24小时制。但是,unutbu的答案应该很清楚。熊猫组按时间与指定的开始时间

- 2nd Edit。我改变了数据以作出更好的例子。

以下是按日期索引的时间序列。我想从start_datetime开始聚合,并根据下面的timedelta(9.5小时= 34200秒)继续聚合。

def main(): 

    # start_datetime = datetime.datetime(2013, 1, 1, 8) 
    # end_datetime = datetime.datetime(2013, 1, 1, 5, 30) 
    s = pd.Series(
     np.arange(2, 10), 
     pd.to_datetime([ 
      '20130101 7:34:04', '20130101 8:34:08', '20130101 10:34:08', 
      '20130101 12:34:15', '20130101 13:34:28', '20130101 12:34:54', 
      '20130101 14:34:55', '20130101 17:29:12'])) 

    print(s) 
    bar_size = datetime.timedelta(seconds=60*60*9.5) 
    time_group = pd.Grouper(
     freq=pd.Timedelta(bar_size), closed='left', label='right') 
    foobar = s.groupby(time_group).agg(np.sum) 
    print(foobar) 

if __name__ == "__main__": 
    main() 

运行上面的代码将输出以下内容:

2013-01-01 09:30:00  5 
2013-01-01 19:00:00 39 
Freq: 570T, dtype: int64 

大熊猫内部决定开始从午夜分组上午8:00代替。我无法找到强制数据框在上午8:00开始分组的方式。有没有人有使用熊猫功能的解决方案?

回答

4

使用base=480将起点移动480分钟(8小时)。 单位为分钟,因为石斑鱼频率是570T(T,在此,表示分钟):

import datetime 
import pandas as pd 

def main(): 

    start_datetime = datetime.datetime(2013, 1, 1, 8) 
    s = pd.Series(
     range(8), 
     pd.to_datetime([ 
      '20130101 8:34:04', '20130101 10:34:08', '20130101 10:34:08', 
      '20130101 12:34:15', '20130101 1:34:28', '20130101 3:34:54', 
      '20130101 4:34:55', '20130101 5:29:12'])) 

    bar_size = datetime.timedelta(seconds=60*60*9.5) 
    time_group = pd.Grouper(freq=bar_size, closed='left', label='right', 
          base=480) 
    foobar = s.groupby(time_group).agg(sum) 
    print(foobar) 

if __name__ == "__main__": 
    main() 

产生

2013-01-01 08:00:00 22 
2013-01-01 17:30:00  6 
Freq: 570T, dtype: int64 

在内部,当pd.Grouper被赋予一个频率,a TimeGrouper is returned

In [81]: time_group 
Out[81]: <pandas.core.resample.TimeGrouper at 0x7f1499a32198> 

所以参数pas sed到pd.Grouper实际上传递到pd.TimeGrouper

In [82]: pd.TimeGrouper? 
Init signature: pd.TimeGrouper(self, freq='Min', closed=None, label=None, 
           how='mean', nperiods=None, axis=0, 
           fill_method=None, limit=None, loffset=None, 
           kind=None, convention=None, base=0, **kwargs) 

TimeGrouper文档不解释base参数,但它具有相同的含义df.resample

In [83]: df.resample? 
Parameters 
---------- 
base : int, default 0 
    For frequencies that evenly subdivide 1 day, the "origin" of the 
    aggregated intervals. For example, for '5min' frequency, base could 
    range from 0 through 4. Defaults to 0 
+0

伟大的答案!谢谢! – itzjustricky

0

下面将让你开始到日期向前8小时滑动:

(s.index + pd.Timedelta('9 hours 30 minutes')).strftime('%Y-%m-%d') 
# array([u'2013-01-01', u'2013-01-01', u'2013-01-01', u'2013-01-01', 
# u'2013-01-01', u'2013-01-01', u'2013-01-01', u'2013-01-01'], 
# dtype='<U10') 

然后,您可以拨打:

s.groupby((s.index + pd.Timedelta('9 hours 30 minutes')).strftime('%Y-%m-%d')).agg(sum) 
# 2013-01-01 28 

您也可以仅仅依靠对大熊猫的datetime模块的功能,而不是单独导入datetime

import pandas as pd 


def main(): 

    start_datetime = pd.datetime(2013, 1, 1, 8) 

    s = pd.Series(
     range(8), 
     pd.to_datetime([ 
      '20130101 8:34:04', '20130101 10:34:08', '20130101 10:34:08', 
      '20130101 12:34:15', '20130101 1:34:28', '20130101 3:34:54', 
      '20130101 4:34:55', '20130101 5:29:12'])) 

    time_group = (s.index + pd.Timedelta('9 hours 30 minutes')).strftime('%Y-%m-%d') 
    foobar = s.groupby(time_group).agg(sum) 
    print(foobar)