Numpy求和运行长度非零值

寻找一个快速向量化函数，该函数返回连续非零值的滚动数目。计数应该从0开始，每遇到一个零。结果应该与输入数组具有相同的形状。Numpy求和运行长度非零值

给定一个数组是这样的：

x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1])

函数应该返回此：

array([1, 2, 3, 0, 0, 1, 0, 1, 2])

来源

2015-04-26 steve

此信息列出了一个矢量的方法，其基本上包括两个步骤：

在对应于x非零地方初始化相同的大小作为输入向量x和设定者的零矢量。
接下来，在该向量中，我们需要在每个“岛”的结束/停止位置之后放置每个岛的游程长度减去。其意图是稍后再使用cumsum，这会导致“岛屿”和其他地方的连续数字。

这里的执行 -

import numpy as np 

#Append zeros at the start and end of input array, x 
xa = np.hstack([[0],x,[0]]) 

# Get an array of ones and zeros, with ones for nonzeros of x and zeros elsewhere 
xa1 =(xa!=0)+0 

# Find consecutive differences on xa1 
xadf = np.diff(xa1) 

# Find start and stop+1 indices and thus the lengths of "islands" of non-zeros 
starts = np.where(xadf==1)[0] 
stops_p1 = np.where(xadf==-1)[0] 
lens = stops_p1 - starts 

# Mark indices where "minus ones" are to be put for applying cumsum 
put_m1 = stops_p1[[stops_p1 < x.size]] 

# Setup vector with ones for nonzero x's, "minus lens" at stops +1 & zeros elsewhere 
vec = xa1[1:-1] # Note: this will change xa1, but it's okay as not needed anymore 
vec[put_m1] = -lens[0:put_m1.size] 

# Perform cumsum to get the desired output 
out = vec.cumsum()

采样运行 -

In [116]: x 
Out[116]: array([ 0. , 2.3, 1.2, 4.1, 0. , 0. , 5.3, 0. , 1.2, 3.1, 0. ]) 

In [117]: out 
Out[117]: array([0, 1, 2, 3, 0, 0, 1, 0, 1, 2, 0], dtype=int32)

运行测试 -

这里的一些运行时间测试，比较反对其他itertools.groupby based approach所提出的方法 -

In [21]: N = 1000000 
    ...: x = np.random.rand(1,N) 
    ...: x[x>0.5] = 0.0 
    ...: x = x.ravel() 
    ...: 

In [19]: %timeit sumrunlen_vectorized(x) 
10 loops, best of 3: 19.9 ms per loop 

In [20]: %timeit sumrunlen_loopy(x) 
1 loops, best of 3: 2.86 s per loop

来源

2015-04-26 04:16:44 Divakar

我是基于这样的思考，但不能确定如何在过渡点计算负运行长度。感谢您的解决方案。由于vec1.cumsum（）等于（x！= 0）+ 0，所以可以通过移除vec1来缩短它。我在这里是新的。不确定是否可以编辑别人的答案。 – steve

@steve啊在那里很好观察！让我编辑它，因为我需要编辑解释步骤的文本。 – Divakar

多一点... non_zero =（x！= 0）+ 0; xa = np.hstack（[[0]，non_zero，[0]]）; xadf = np.diff（xa）;然后后来vec2 = non_zero – steve

您可以使用itertools.groupby和np.hstack：

>>> import numpy as np 
>>> x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1]) 
>>> from itertools import groupby 

>>> np.hstack([[i if j!=0 else j for i,j in enumerate(g,1)] for _,g in groupby(x,key=lambda x: x!=0)]) 
array([ 1., 2., 3., 0., 0., 1., 0., 1., 2.])

我们可以分组基于阵列在非零元素上，然后使用列表理解和枚举替换t他用这些索引非零子数组然后用np.hstack将列表弄平。

来源

2015-04-26 02:55:16 Kasramvd

Numpy求和运行长度非零值

回答

相关问题