2017-08-14 58 views
0

我想选择的数据集的右部为具有以下示例说明:分割由time列的数据帧 - 大熊猫

输入DF:

id_B, ts_B,value 
id1,2017-04-27 01:35:30,0 
id1,2017-04-27 01:35:40,0 
id1,2017-04-27 01:35:50,1 
id1,2017-04-27 01:36:00,4 
id1,2017-04-27 01:36:10,5 
id1,2017-04-27 01:36:20,100 
id1,2017-04-27 01:36:30,155 
id1,2017-04-27 01:36:40,235 
id1,2017-04-27 01:36:50,0 
id1,2017-04-27 01:36:60,0 
id1,2017-04-27 01:37:00,2353 
id1,2017-04-27 01:37:10,221 
id1,2017-04-27 01:37:20,2432 
id1,2017-04-27 01:37:30,2654 
id1,2017-04-27 01:37:40,12 
id1,2017-04-27 01:37:50,5 
id1,2017-04-27 01:38:00,5 
id1,2017-04-27 01:38:10,23 
id1,2017-04-27 01:38:20,5 
id1,2017-04-27 01:38:30,2 
id1,2017-04-27 01:38:40,2 
id1,2017-04-27 01:38:50,1 
id1,2017-04-27 01:39:00,0 
id1,2017-04-27 01:39:10,0 
id1,2017-04-27 01:39:20,0 
id1,2017-04-27 01:39:30,0 
id1,2017-04-27 01:39:40,0 
id1,2017-04-27 01:39:50,0 
id1,2017-04-27 01:40:00,0 
id1,2017-04-27 01:40:10,1 
id1,2017-04-27 01:40:20,5 
id1,2017-04-27 01:40:30,221 
id1,2017-04-27 01:40:40,2432 
id1,2017-04-27 01:40:50,2654 
id1,2017-04-27 01:40:60,12 
id1,2017-04-27 01:41:00,5 
id1,2017-04-27 01:41:10,5 
id1,2017-04-27 01:41:20,23 
id1,2017-04-27 01:41:30,5 
id1,2017-04-27 01:41:40,2 
id1,2017-04-27 01:41:50,1 

考虑以下内容: segment_number = 1
持续时间= 3分钟

我想选择从第一个df.value非零开始的数据框的第一个段,直到覆盖3分钟持续时间的最后一个值。

输出: id1,2017-04-27 01:35:50,1 id1,2017-04-27 01:36:00,4 id1,2017-04-27 01:36:10,5 id1,2017-04-27 01:36:20,100 id1,2017-04-27 01:36:30,155 id1,2017-04-27 01:36:40,235 id1,2017-04-27 01:36:50,0 id1,2017-04-27 01:36:60,0 id1,2017-04-27 01:37:00,2353 id1,2017-04-27 01:37:10,221 id1,2017-04-27 01:37:20,2432 id1,2017-04-27 01:37:30,2654 id1,2017-04-27 01:37:40,12 id1,2017-04-27 01:37:50,5 id1,2017-04-27 01:38:00,5 id1,2017-04-27 01:38:10,23 id1,2017-04-27 01:38:20,5 id1,2017-04-27 01:38:30,2 id1,2017-04-27 01:38:40,2 id1,2017-04-27 01:38:50,1

考虑以下内容: segment_number = 2
持续时间= 1.40分钟再予

我想选择的dateframe从第一df.value非零开始直到所述第二区段最后的值覆盖了1.40分钟的持续时间。

输出:

id1,2017-04-27 01:40:10,1 
id1,2017-04-27 01:40:20,5 
id1,2017-04-27 01:40:30,221 
id1,2017-04-27 01:40:40,2432 
id1,2017-04-27 01:40:50,2654 
id1,2017-04-27 01:40:60,12 
id1,2017-04-27 01:41:00,5 
id1,2017-04-27 01:41:10,5 
id1,2017-04-27 01:41:20,23 
id1,2017-04-27 01:41:30,5 
id1,2017-04-27 01:41:40,2 
id1,2017-04-27 01:41:50,1 

到目前为止,我没有索引DF WRT到ts_B使用`pd.to_datetime和set_index”,并使用一个变量‘last_end_point’,保持了前一段的指数跟踪。
但我没有得到正确的输出。

任何帮助,将不胜感激。

+0

那么,你想拆你的'由递减的时间间隔df'? –

+0

是的,有点。更具体地说,我想按持续时间和起点分开它,第一次是从头开始,第二次是前一次的最后一行的索引。 –

+0

对不起,上一个分段的最后一个原始值+1。但它应该避免用df.value = 0开始段,并始终选择不为零的第一个段。 –

回答

0

这是我制定了答案:

import pandas as pd 
import numpy as np 
import datetime 

df = pd.read_csv("filename.csv") 
df['ts_B'] = pd.to_datetime(df['ts_B']) 

def find_the_energenies_segment(key_mapped, duration, energenie_df, threshold): 
    non_zero_indexs = energenie_df[energenie_df["value"]>threshold].index 

    first_index = non_zero_indexs[0] if len(non_zero_indexs)>0 else None 


    if(not first_index): 
     return {"sub_df": None, 
      "start_index": None, 
      "end_index":None, 
      "duration": duration} 

    start_time = energenie_df.loc[first_index].ts_B 
    hours,minutes,seconds = duration.split(":") 
    end_time = start_time + datetime.timedelta(hours=int(hours),minutes=int(minutes),seconds=int(seconds)) 


    last_index = energenie_df[energenie_df["ts_B"]>end_time].index[0]-1 

    return {"sub_df": energenie_df.loc[first_index:last_index], 
     "start_index": first_index, 
     "end_index":last_index, 
     "duration": duration} 


out = find_the_energenies_segment("id1", "00:03:00", df, 0) 
print(out)