2017-03-18 21 views
1

在我的数据框中,我想创建一个'5D_Peak'列作为滚动最大值,然后是滚动计数的历史数据接近峰值的另一列。我想知道是否有简单的方法来简单地或理想地引导计算。熊猫:滚动计数,如果在一个循环内

这是我在一个普通的,但复杂的方式代码:

import numpy as np 
import pandas as pd 

df = pd.DataFrame([[1,2,4],[4,5,2],[3,5,8],[1,8,6],[5,2,8],[1,4,10],[3,5,9],[1,4,7],[1,4,6]], columns=list('ABC')) 

df['5D_Peak']=df['C'].rolling(window=5,center=False).max() 

for i in range(5,len(df.A)): 
    val=0 
    for j in range(i-5,i): 
     if df.loc[j,'C']>df.loc[i,'5D_Peak']-2 and df.loc[j,'C']<df.loc[i,'5D_Peak']+2: 
      val+=1 
    df.loc[i,'5D_Close_to_Peak_Count']=val 

这是我想要的输出:

A B C 5D_Peak 5D_Close_to_Peak_Count 
0 1 2 4  NaN      NaN 
1 4 5 2  NaN      NaN 
2 3 5 8  NaN      NaN 
3 1 8 6  NaN      NaN 
4 5 2 8  8.0      NaN 
5 1 4 10  10.0      0.0 
6 3 5 9  10.0      1.0 
7 1 4 7  10.0      2.0 
8 1 4 6  10.0      2.0 

回答

1

我相信这是你想要的。您可以设置以下两个值:

'''the window within which to search "close-to_peak" values''' 
lkp_rng = 5 

'''how close is close?''' 
closeness_measure = 2 

'''function to count the number of "close-to_peak" values in the lkp_rng''' 
fc = lambda x: np.count_nonzero(np.where(x >= x.max()- closeness_measure)) 

'''apply fc to the coulmn you choose''' 
df['5D_Close_to_Peak_Count'] = df['C'].rolling(window=lkp_range,center=False).apply(fc) 
df.head(10) 
     A B C 5D_Peak  5D_Close_to_Peak_Count 
    0 1 2 4 NaN   NaN 
    1 4 5 2 NaN   NaN 
    2 3 5 8 NaN   NaN 
    3 1 8 6 NaN   NaN 
    4 5 2 8 8.0   3.0 
    5 1 4 10 10.0   3.0 
    6 3 5 9 10.0   3.0 
    7 1 4 7 10.0   3.0 
    8 1 4 6 10.0   2.0 

我猜你的意思是“历史数据”。

+0

谢谢。这也解决了我的问题。但我想JohnE建议的矢量化方法更快? – thunderlion

+0

如果您使用的是ipython notebook,只需在运行代码的单元格顶部插入'%% prun'即可。它会给出一个长长的清单,并在最上面有一句话总结。我在0.003秒内为我的''826函数调用(820个原始调用)。如果您插入'%% timeit'并运行该单元格,则会给出“”1000个循环,每个循环最好为3:745μs“。你也可以检查其他代码。' – user2738815

+0

@JohnE你没有测试你的代码的速度吗? – user2738815