2017-08-24 131 views
4

this question我知道如何插入给定时间戳的时间序列。我想知道如何插入给定值的时间戳,例如下面的示例以获得估计值NaT的值。Python熊猫时间序列插值日期时间数据

interval   datetime    
0.782296 2012-11-19 12:40:10 
0.795469     NaT 
0.821426 2012-11-19 12:35:10 
0.834957     NaT 
0.864383 2012-11-19 12:30:10 
0.906240 2012-11-19 12:25:10 

P.S.我试图直接使用df['datetime'].interpolate()但失败。

回答

1

这似乎工作。有可能清理一下代码。但你得到它的要点

from datetime import datetime 
import pandas as pd 
import time 

#Create data 
df = pd.DataFrame({ 'interval' : [0.782296, 0.795469, 0.821426, 0.834957, 
            0.864383, 0.906240], 
        'datetime' : [datetime(2012, 11, 19, 12, 40, 10), pd.NaT, 
            datetime(2012, 11, 19, 12, 35, 10), pd.NaT, 
            datetime(2012, 11, 19, 12, 30, 10), 
            datetime(2012, 11, 19, 12, 25, 10) 
            ]}) 


#Cast date to seconds (also recast the NaT to Nan) 
df['seconds'] = [time.mktime(t.timetuple()) if t is not pd.NaT else float('nan') for t in df['datetime'] ] 

#Set the interval as the index, as interpolation uses the index 
df.set_index('interval', inplace=True) 
#Use the 'values'-argument to actually use the values of the index and not the spacing 
df['intepolated'] = df['seconds'].interpolate('values') 
#Cast the interpolated seconds back to datetime 
df['datetime2'] = [datetime.utcfromtimestamp(t) for t in df['intepolated']] 

#Clean up 
df.reset_index(inplace=True) 
df = df[['interval', 'datetime2']] 

>>>>df 
Out[25]: 
    interval     datetime2 
0 0.782296 2012-11-19 11:40:10.000000 
1 0.795469 2012-11-19 11:38:29.005878 
2 0.821426 2012-11-19 11:35:10.000000 
3 0.834957 2012-11-19 11:33:35.503178 
4 0.864383 2012-11-19 11:30:10.000000 
5 0.906240 2012-11-19 11:25:10.000000 

希望这是你想要的。

+0

感谢您的回答,我正在考虑将datetime转换为float。 – natsuapo

+0

没问题。编辑答案,因为它第一次不是真的正确。我省略了插值函数中的“值”参数。 – mortysporty