为了让您上面定义的数据框,我们可以这样做:
df = pd.DataFrame({'ID': ['001', '001', '002'], 'time': ['00:00:00', '00:15:00', '00:05:00'], 'place': [1, 3, 2]}).set_index(['ID', 'time'])
为了能够与刚刚上班时间和相同的操作应用到每个ID值,让我们拆散“ID “让我们在顶层与我们的ID列的多指标:
In [91]: df = df.unstack(0)
In [92]: df
Out[92]:
place
ID 001 002
time
00:00:00 1.0 NaN
00:05:00 NaN 2.0
00:15:00 3.0 NaN
现在,让我们投指数(只是‘时间’,现在)到DatetimeIndex:
In [93]: df.index = pd.to_datetime(df.index)
In [94]: df
Out[94]:
place
ID 001 002
time
2017-06-06 00:00:00 1.0 NaN
2017-06-06 00:05:00 NaN 2.0
2017-06-06 00:15:00 3.0 NaN
这增加了今天的日期,但我们可以稍后删除它。
接下来,让我们创建另一个DatetimeIndex由5分钟为增量为今天的日期:
In [95]: times = pd.date_range("00:00:00", "23:55:00", freq="5min")
In [96]: times
Out[96]:
DatetimeIndex(['2017-06-06 00:00:00', '2017-06-06 00:05:00',
'2017-06-06 00:10:00', '2017-06-06 00:15:00',
'2017-06-06 00:20:00', '2017-06-06 00:25:00',
'2017-06-06 00:30:00', '2017-06-06 00:35:00',
'2017-06-06 00:40:00', '2017-06-06 00:45:00',
...
'2017-06-06 23:10:00', '2017-06-06 23:15:00',
'2017-06-06 23:20:00', '2017-06-06 23:25:00',
'2017-06-06 23:30:00', '2017-06-06 23:35:00',
'2017-06-06 23:40:00', '2017-06-06 23:45:00',
'2017-06-06 23:50:00', '2017-06-06 23:55:00'],
dtype='datetime64[ns]', length=288, freq='5T')
让我们重新索引我们的DF指数对这一新DatetimeIndex:
In [97]: df = df.reindex(times)
In [98]: df
Out[98]:
place
ID 001 002
2017-06-06 00:00:00 1.0 NaN
2017-06-06 00:05:00 NaN 2.0
2017-06-06 00:10:00 NaN NaN
2017-06-06 00:15:00 3.0 NaN
2017-06-06 00:20:00 NaN NaN
...
现在我们只需要转发填充以便每次都是最后一个非NaN位置:
In [99]: df = df.ffill()
In [100]: df
Out[100]:
place
ID 001 002
2017-06-06 00:00:00 1.0 NaN
2017-06-06 00:05:00 1.0 2.0
2017-06-06 00:10:00 1.0 2.0
2017-06-06 00:15:00 3.0 2.0
2017-06-06 00:20:00 3.0 2.0
2017-06-06 00:25:00 3.0 2.0
2017-06-06 00:30:00 3.0 2.0
...
从这里,我们需要摆脱日期:
In [101]: df.index = df.index.strftime('%H:%M:%S')
In [102]: df
Out[102]:
place
ID 001 002
00:00:00 1.0 NaN
00:05:00 1.0 2.0
00:10:00 1.0 2.0
00:15:00 3.0 2.0
00:20:00 3.0 2.0
00:25:00 3.0 2.0
...
我们已经在我们的“时间”指数下跌的名字,让我们把它放回去:
df.index = df.index.set_names('time')
最后,把'ID'放回索引:
In [103]: df.stack(1).swaplevel(0, 1)
Out[103]:
place
ID time
001 00:00:00 1.0
00:05:00 1.0
002 00:05:00 2.0
001 00:10:00 1.0
002 00:10:00 2.0
001 00:15:00 3.0
...
真的很感激它! –