如何重新组合数据框并累积colume的值？

我有一个CSV文件看起来像这样：如何重新组合数据框并累积colume的值？

date      price  volume 
2017-10-17 01:00:11.031 51.91  1 
2017-10-17 01:00:11.828 51.91  1 
2017-10-17 01:00:12.640 51.91  1 
2017-10-17 01:00:13.140 51.90  -9 
2017-10-17 01:00:15.328 51.90  -5 
2017-10-17 01:00:16.531 51.90  1 
2017-10-17 01:00:16.531 51.89  -2 
2017-10-17 01:00:19.937 51.90  1 
2017-10-17 01:00:24.546 51.90  1 
2017-10-17 01:00:25.250 51.90  1 
2017-10-17 01:00:32.843 51.89  -9 
2017-10-17 01:00:42.859 51.89  -5 
2017-10-17 01:00:43.453 51.89  -1 
2017-10-17 01:00:43.546 51.90  1 
2017-10-17 01:00:45.953 51.90  7 
...

我想打一个数据帧，显示多少量有所每5一刻才积累，在每一个价格水平。

例如，如果最高和最低价格分别为51.21和51.11 2017年10月17日00:05之间的00：00〜2017年10月17日，结果将是：

datetime     price  pos_volume  neg_volume 
2017-10-17 00:00   51.21  3    4 
         51.20  21    23 
         51.19  44    21 
         51.18  31    33 
         ... 
         51.14  14    21 
         51.13  30    29 
         51.12  2    3 
         51.11  5    1

有两列来区分正面和负面的数量。

我想我可以做到这一点，如果我使用很多条件循环，但我想知道是否有更多pythonic简单的方法来做到这一点。谢谢您阅读此篇！

来源

2017-10-20 maynull

你见过'df.resample'？ –

@cᴏʟᴅsᴘᴇᴇᴅ哦，谢谢！我会查找它 – maynull

可以使用np.where分隔正和负值，则用枢轴与桌子索引作为grouper与频率为5分钟，然后使用如aggfunccount（它忽略NaN值）。

df['pos_vol'] = np.where(df['volume']>0,df['volume'],np.nan) 
df['neg_vol'] = np.where(df['volume']<0,df['volume'],np.nan) 

ndf = df.pivot_table(values=['pos_vol','neg_vol'],index=[pd.Grouper(key='date', freq='5min'),'price'],aggfunc='count')

输出：

 
          neg_vol pos_vol 
date    price     
2017-10-17 01:00:00 51.89  4  0 
        51.90  2  6 
        51.91  0  3

对于排序索引可以使用ndf = ndf.sort_index(level=1,ascending=False)

输出：

 
          neg_vol pos_vol 
date    price     
2017-10-17 01:00:00 51.91  0  3 
        51.90  2  6 
        51.89  4  0

来源

2017-10-20 04:33:58 Dark

不错，使用'pd.Grouper'。 –

@cᴏʟᴅsᴘᴇᴇᴅThankyou，一周前回答了类似的问题。考虑到这一点。 – Dark

@Bharath shetty非常感谢你的帮助！ :) – maynull

如何重新组合数据框并累积colume的值？

回答

相关问题