Arnaud Legoux移动平均值和numpy

我想编写使用NumPy（或Pandas）计算Arnaud Legoux移动平均值的向量版代码。你能帮我解决这个问题吗？谢谢。Arnaud Legoux移动平均值和numpy

非向量版本看起来像下面（见下文）。

def NPALMA(pnp_array, **kwargs) : 
    ''' 
    ALMA - Arnaud Legoux Moving Average, 
    http://www.financial-hacker.com/trend-delusion-or-reality/ 
    https://github.com/darwinsys/Trading_Strategies/blob/master/ML/Features.py 
    ''' 
    length = kwargs['length'] 
    # just some number (6.0 is useful) 
    sigma = kwargs['sigma'] 
    # sensisitivity (close to 1) or smoothness (close to 0) 
    offset = kwargs['offset'] 

    asize = length - 1 
    m = offset * asize 
    s = length/sigma 
    dss = 2 * s * s 

    alma = np.zeros(pnp_array.shape) 
    wtd_sum = np.zeros(pnp_array.shape) 

    for l in range(len(pnp_array)): 
     if l >= asize: 
      for i in range(length): 
       im = i - m 
       wtd = np.exp(-(im * im)/dss) 
       alma[l] += pnp_array[l - length + i] * wtd 
       wtd_sum[l] += wtd 
      alma[l] = alma[l]/wtd_sum[l] 

    return alma

来源

2017-10-28 Prokhozhii

启动方法

我们可以创建沿第一轴滑动窗口，然后用张量乘以wtd值的总和，减少的范围。

的实施将是这个样子 -

# Get all wtd values in an array 
wtds = np.exp(-(np.arange(length) - m)**2/dss) 

# Get the sliding windows for input array along first axis 
pnp_array3D = strided_axis0(pnp_array,len(wtds)) 

# Initialize o/p array 
out = np.zeros(pnp_array.shape) 

# Get sum-reductions for the windows which don't need wrapping over 
out[length:] = np.tensordot(pnp_array3D,wtds,axes=((1),(0)))[:-1] 

# Last element of the output needed wrapping. So, do it separately. 
out[length-1] = wtds.dot(pnp_array[np.r_[-1,range(length-1)]]) 

# Finally perform the divisions 
out /= wtds.sum()

函数来获取滑动窗口：strided_axis0是从here。

升压用1D卷积

那些乘法与wtds值，然后它们的总和，减少是基本上沿该第一轴线卷积。因此，我们可以沿着axis=0使用scipy.ndimage.convolve1d。考虑到内存效率，这会更快，因为我们不会创建巨大的滑动窗口。

实施将是 -

from scipy.ndimage import convolve1d as conv 

avgs = conv(pnp_array, weights=wtds/wtds.sum(),axis=0, mode='wrap')

因此，out[length-1:]，这是非零行。将相同avgs[:-length+1]。

如果我们使用来自wtds的非常小的内核号码，可能会有一些精度差异。所以，请记住，如果使用这种方法convolution。

运行测试

途径 -

def original_app(pnp_array, length, m, dss): 
    alma = np.zeros(pnp_array.shape) 
    wtd_sum = np.zeros(pnp_array.shape) 

    for l in range(len(pnp_array)): 
     if l >= asize: 
      for i in range(length): 
       im = i - m 
       wtd = np.exp(-(im * im)/dss) 
       alma[l] += pnp_array[l - length + i] * wtd 
       wtd_sum[l] += wtd 
      alma[l] = alma[l]/wtd_sum[l] 
    return alma 

def vectorized_app1(pnp_array, length, m, dss): 
    wtds = np.exp(-(np.arange(length) - m)**2/dss) 
    pnp_array3D = strided_axis0(pnp_array,len(wtds)) 
    out = np.zeros(pnp_array.shape) 
    out[length:] = np.tensordot(pnp_array3D,wtds,axes=((1),(0)))[:-1] 
    out[length-1] = wtds.dot(pnp_array[np.r_[-1,range(length-1)]]) 
    out /= wtds.sum() 
    return out 

def vectorized_app2(pnp_array, length, m, dss): 
    wtds = np.exp(-(np.arange(length) - m)**2/dss) 
    return conv(pnp_array, weights=wtds/wtds.sum(),axis=0, mode='wrap')

计时 -

In [470]: np.random.seed(0) 
    ...: m,n = 1000,100 
    ...: pnp_array = np.random.rand(m,n) 
    ...: 
    ...: length = 6 
    ...: sigma = 0.3 
    ...: offset = 0.5 
    ...: 
    ...: asize = length - 1 
    ...: m = np.floor(offset * asize) 
    ...: s = length/sigma 
    ...: dss = 2 * s * s 
    ...: 

In [471]: %timeit original_app(pnp_array, length, m, dss) 
    ...: %timeit vectorized_app1(pnp_array, length, m, dss) 
    ...: %timeit vectorized_app2(pnp_array, length, m, dss) 
    ...: 
10 loops, best of 3: 36.1 ms per loop 
1000 loops, best of 3: 1.84 ms per loop 
1000 loops, best of 3: 684 µs per loop 

In [472]: np.random.seed(0) 
    ...: m,n = 10000,1000 # rest same as previous one 

In [473]: %timeit original_app(pnp_array, length, m, dss) 
    ...: %timeit vectorized_app1(pnp_array, length, m, dss) 
    ...: %timeit vectorized_app2(pnp_array, length, m, dss) 
    ...: 
1 loop, best of 3: 503 ms per loop 
1 loop, best of 3: 222 ms per loop 
10 loops, best of 3: 106 ms per loop

来源

2017-10-28 14:29:34 Divakar

我检查你的一维卷积第二种方法。它看起来有什么问题。但我无法得到确切的结果。我的例子： pnp_array = np.array（[3924.00752506，5774.30335369，5519.40734814，4931.71344059]）偏移= 0.85 西格玛= 6 长度= 3 米= 1.7 DSS = 0.5 预期的结果应该是[0， 0，5594.17030842，5115.59420056]。但第二个应用程序返回[0，0，5693.3358598,5333.61073335]。所以累计误差是-317.182084168。这是因为你提到的小内核数量？ – Prokhozhii

@Prokhozhii对于那个集合，'wtds = np.exp（ - （np.arange（length） - m）** 2/dss）''wtds'的值是什么？ – Divakar

他们在第二个应用程序中是正确的： array（[0.00308872,0.3753111,0.83527021]） – Prokhozhii

Arnaud Legoux移动平均值和numpy

回答

相关问题