保留日期时间索引

假设我有以下数据帧（时间序列中，第一塔是DateTimeIndex）保留日期时间索引

      atn file 
datetime        
2012-10-08 14:00:00 23.007462  1 
2012-10-08 14:30:00 27.045666  1 
2012-10-08 15:00:00 31.483825  1 
2012-10-08 15:30:00 37.540651  2 
2012-10-08 16:00:00 43.564573  2 
2012-10-08 16:00:00 48.589852  2 
2012-10-08 16:00:00 55.289452  2

我目标是向具有一定数目的在所述第一外观提取行最后一栏“文件”，所以获得与此表：

 datetime    atn 
file        
1  2012-10-08 14:00:00 23.007462 
2  2012-10-08 15:30:00 37.540651

我方法是B组Y“文件”，然后聚集在“第一”：

dt.groupby(by="file").aggregate("first")

但是与此有关的问题是，则索引不被用作该分组的一列。我解决了这个首先通过添加索引的列：

dt2 = dt.reset_index() 
dt2.groupby(by="file").aggregate("first")

但现在的问题是的datetime列不是日期了，但浮动：

  datetime  atn 
file       
1  1.349705e+18 23.007462 
2  1.349710e+18 37.540651

有

将浮点数转换回日期时间的方法？
或者一种保存groupby/aggregate-operation中日期时间的方法？
或更好的方式来实现这个最终tabel？

的示例数据帧可以被使用如下：

拷贝（到剪贴板）：

2012-10-08 14:00:00, 23.007462,  1 
2012-10-08 14:30:00, 27.045666,  1 
2012-10-08 15:00:00, 31.483825,  1 
2012-10-08 15:30:00, 37.540651,  2 
2012-10-08 16:00:00, 43.564573,  2 
2012-10-08 16:00:00, 48.589852,  2 
2012-10-08 16:00:00, 55.289452,  2

然后：

dt = pandas.read_clipboard(sep=",", parse_dates=True, index_col=0, 
          names=["datetime", "atn", "file"])

来源

2012-11-13 joris

您使用哪种版本的熊猫？在你的进程之后，我正在获取'dt2'并且适当地保留了日期时间。 –

也许也很重要，我的numpy版本（datetime64相关的东西）：>>> pandas .__ version__ '0.9.0' >>> np .__ version__ '1.6.1' – joris

好的。 'parse_dates'似乎是问题@joris。见下面的答案。 –

我认为这是熊猫中的一个错误 - dtype在groupby之后被更改为一个浮点数

dt3 = dt2.groupby(by="file").aggregate("first") 
dt3.dtypes

给我：

datetime float64 
atn   float64

要更改D型回datetime64你可以这样做：

dt3['datetime'] = pd.Series(dt3['datetime'], dtype='datetime64[ns]')

我已经创建了GitHub

来源

2012-11-13 14:02:05

大师看起来不错：https：//github.com/pydata/pandas/issues/2238#issuecomment-10327256 –

谢谢！如您所指出的那样，将其更改回datetime64目前是一个很好的解决方案。 – joris

一个新的问题看起来像错误，但在这一刻，没有指定parse_dates=True会给我预期的结果。

我IPython的结果 - 没有parse_dates=True： -

In [29]: dt2 = pd.read_clipboard(sep=",", index_col=0, 
          names=["datetime", "atn", "file"]) 

In [30]: dt2 
Out[30]: 
          atn file 
datetime        
2012-10-08 14:00:00 23.007462  1 
2012-10-08 14:30:00 27.045666  1 
2012-10-08 15:00:00 31.483825  1 
2012-10-08 15:30:00 37.540651  2 
2012-10-08 16:00:00 43.564573  2 
2012-10-08 16:00:00 48.589852  2 
2012-10-08 16:00:00 55.289452  2 

In [31]: dt2.reset_index().groupby(by="file").aggregate("first") 
Out[31]: 
       datetime  atn 
file         
1  2012-10-08 14:00:00 23.007462 
2  2012-10-08 15:30:00 37.540651 

In [32]:

我IPython的结果，与parse_dates=True： -

In [33]: dt = pd.read_clipboard(sep=",", parse_dates=True, index_col=0, 
          names=["datetime", "atn", "file"]) 
KeyboardInterrupt 

In [33]: dt = pd.read_clipboard(sep=",", parse_dates=True, index_col=0, 
          names=["datetime", "atn", "file"]) 

In [34]: dt.reset_index().groupby(by="file").aggregate("first") 
Out[34]: 
      datetime  atn 
file       
1  1.349705e+18 23.007462 
2  1.349710e+18 37.540651

明确检查dtypes： -

In [40]: new_dt = dt.reset_index().groupby(by="file").aggregate("first") 

In [41]: new_dt 
Out[41]: 
      datetime  atn 
file       
1  1.349705e+18 23.007462 
2  1.349710e+18 37.540651 

In [42]: new_dt.dtypes 
Out[42]: 
datetime float64 
atn   float64 

In [43]: new_dt2 = dt2.reset_index().groupby(by="file").aggregate("first") 

In [44]: new_dt2.dtypes 
Out[44]: 
datetime  object 
atn   float64

来源

2012-11-13 14:46:32

未指定'parse_dates = True'将导致dtype对象的索引，它将保存字符串。在这种情况下没有DatatimeIndex！ –

感谢您的回答，但我需要它仍然是我进一步分析的日期时间。 – joris

我相信这是固定并将在0.9.1发布

来源

2012-11-14 00:11:06

保留日期时间索引

回答

相关问题