2016-07-11 97 views
1

在熊猫我有数据集:熊猫前一组最小/最大

     Value 
2005-08-03 23:15:00 10.5 
2005-08-03 23:30:00 10.0 
2005-08-03 23:45:00 10.0 
2005-08-04 00:00:00 10.5 
2005-08-04 00:15:00 10.5 
2005-08-04 00:30:00 11.0 
2005-08-04 00:45:00 10.5 
2005-08-04 01:00:00 11.0 
... 
2005-08-04 23:15:00 14.0 
2005-08-04 23:30:00 13.5 
2005-08-04 23:45:00 13.0 
2005-08-05 00:00:00 13.5 
2005-08-05 00:15:00 14.0 
2005-08-05 00:30:00 14.0 
2005-08-05 00:45:00 14.5 

首先,我想组数据按日期和各组的最高值存储在新的专栏中,我用下面的代码完成这个任务:

df['ValueMaxInGroup'] = df.groupby(pd.TimeGrouper('D'))['Value'].transform(max) 

现在我想创建另一个列来存储前一组最大值,因此所需的数据帧将如下所示:

     Value ValueMaxInGroup ValueMaxInPrevGroup 
2005-08-03 23:15:00 10.5    10.5     NaN 
2005-08-03 23:30:00 10.0    10.5     NaN 
2005-08-03 23:45:00 10.0    10.5     NaN 
2005-08-04 00:00:00 10.5    14.0     10.5 
2005-08-04 00:15:00 10.5    14.0     10.5 
2005-08-04 00:30:00 11.0    14.0     10.5 
2005-08-04 00:45:00 10.5    14.0     10.5 
2005-08-04 01:00:00 11.0    14.0     10.5 
... 
2005-08-04 23:15:00 14.0    14.0     10.5 
2005-08-04 23:30:00 13.5    14.0     10.5 
2005-08-04 23:45:00 13.0    14.0     10.5 
2005-08-05 00:00:00 13.5    14.5     14.0 
2005-08-05 00:15:00 14.0    14.5     14.0 
2005-08-05 00:30:00 14.0    14.5     14.0 
2005-08-05 00:45:00 14.5    14.5     14.0 

所以,简单地获取前行的价值,我用

df['ValueInPrevRow'] = df.shift(1)['Value'] 

有没有什么办法让另一组的最小/最大/ F(X)?我假设

df['ValueMaxInPrevGroup'] = df.groupby(pd.TimeGrouper('D')).shift(1)['Value'].transform(max) 

但它没有工作。

感谢

回答

1

您可以通过使用groupby/aggshiftmerge得到期望的结果:

import numpy as np 
import pandas as pd 
df = pd.DataFrame({'Value': [10.5, 10.0, 10.0, 10.5, 10.5, 11.0, 10.5, 11.0, 14.0, 13.5, 13.0, 13.5, 14.0, 14.0, 14.5]}, index=['2005-08-03 23:15:00', '2005-08-03 23:30:00', '2005-08-03 23:45:00', '2005-08-04 00:00:00', '2005-08-04 00:15:00', '2005-08-04 00:30:00', '2005-08-04 00:45:00', '2005-08-04 01:00:00', '2005-08-04 23:15:00', '2005-08-04 23:30:00', '2005-08-04 23:45:00', '2005-08-05 00:00:00', '2005-08-05 00:15:00', '2005-08-05 00:30:00', '2005-08-05 00:45:00']) 
df.index = pd.DatetimeIndex(df.index) 

# This is equivalent to 
# df['group'] = pd.to_datetime(df.index.date) 
# when freq='D', but the version below works with any freq string, not just `'D'`. 
grouped = df.groupby(pd.TimeGrouper('D')) 
labels, uniqs, ngroups = grouped.grouper.group_info 
df['group'] = grouped.grouper.binlabels[labels] 

result = grouped[['Value']].agg(max) 
result = result.rename(columns={'Value':'Max'}) 
result['PreviouMax'] = result['Max'].shift(1) 

df = pd.merge(df, result, left_on=['group'], right_index=True) 
print(df) 

产生

     Value  group Max PreviouMax 
2005-08-03 23:15:00 10.5 2005-08-03 10.5   NaN 
2005-08-03 23:30:00 10.0 2005-08-03 10.5   NaN 
2005-08-03 23:45:00 10.0 2005-08-03 10.5   NaN 
2005-08-04 00:00:00 10.5 2005-08-04 14.0  10.5 
2005-08-04 00:15:00 10.5 2005-08-04 14.0  10.5 
2005-08-04 00:30:00 11.0 2005-08-04 14.0  10.5 
2005-08-04 00:45:00 10.5 2005-08-04 14.0  10.5 
2005-08-04 01:00:00 11.0 2005-08-04 14.0  10.5 
2005-08-04 23:15:00 14.0 2005-08-04 14.0  10.5 
2005-08-04 23:30:00 13.5 2005-08-04 14.0  10.5 
2005-08-04 23:45:00 13.0 2005-08-04 14.0  10.5 
2005-08-05 00:00:00 13.5 2005-08-05 14.5  14.0 
2005-08-05 00:15:00 14.0 2005-08-05 14.5  14.0 
2005-08-05 00:30:00 14.0 2005-08-05 14.5  14.0 
2005-08-05 00:45:00 14.5 2005-08-05 14.5  14.0 

这里的主要思想是利用groupby/agg代替groupby/transform以便我们可以得到

result = grouped[['Value']].agg(max) 
result = result.rename(columns={'Value':'Max'}) 
result['PreviouMax'] = result['Max'].shift(1) 
#    Max PreviouMax 
# group      
# 2005-08-03 10.5   NaN 
# 2005-08-04 14.0  10.5 
# 2005-08-05 14.5  14.0 

然后期望的数据帧可以被表示为与在group日期 result合并df的结果。