2017-03-17 94 views
2

如何做出以下MAD功能的滚动版本numpy的版本(平均绝对偏差)

from numpy import mean, absolute 

def mad(data, axis=None): 
    return mean(absolute(data - mean(data, axis)), axis) 

此代码是一个答案this question

目前,我转换numpy的,以大熊猫然后应用该功能,然后将结果转换回numpy的

pandasDataFrame.rolling(window=90).apply(mad) 

但这是在较大的数据帧低效的。如何在没有循环的情况下获得numpy中相同函数的滚动窗口并给出相同的结果?

+0

不是那么低效? – kmario23

+0

好吧,你知道,在我的头上,我的意思是别的东西:)谢谢 – RaduS

回答

3

这里有一个量化的NumPy的方法 -

# From this post : http://stackoverflow.com/a/40085052/3293881 
def strided_app(a, L, S): # Window len = L, Stride len/stepsize = S 
    nrows = ((a.size-L)//S)+1 
    n = a.strides[0] 
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n)) 

# From this post : http://stackoverflow.com/a/14314054/3293881 by @Jaime 
def moving_average(a, n=3) : 
    ret = np.cumsum(a, dtype=float) 
    ret[n:] = ret[n:] - ret[:-n] 
    return ret[n - 1:]/n 

def mad_numpy(a, W): 
    a2D = strided_app(a,W,1) 
    return np.absolute(a2D - moving_average(a,W)[:,None]).mean(1) 

运行测试 -

In [617]: data = np.random.randint(0,9,(10000)) 
    ...: df = pd.DataFrame(data) 
    ...: 

In [618]: pandas_out = pd.rolling_apply(df,90,mad).values.ravel() 
In [619]: numpy_out = mad_numpy(data,90) 

In [620]: np.allclose(pandas_out[89:], numpy_out) # Nans part clipped 
Out[620]: True 

In [621]: %timeit pd.rolling_apply(df,90,mad) 
10 loops, best of 3: 111 ms per loop 

In [622]: %timeit mad_numpy(data,90) 
100 loops, best of 3: 3.4 ms per loop 

In [623]: 111/3.4 
Out[623]: 32.64705882352941 

巨大32x+加速有过糊涂大熊猫的解决方案!

相关问题