如何迭代切片列表？

-1

我找不到性能增强问题的解决方案。如何迭代切片列表？

我有一维数组，我想计算在滑动指数窗总和，这里有一个例子代码：

import numpy as np 
input = np.linspace(1, 100, 100) 
list_of_indices = [[0, 10], [5, 15], [45, 50]] #just an example 
output = np.array([input[idx[0]: idx[1]].sum() for idx in list_of_indices])

的output阵列的计算相比，numpy的向量化内置极为缓慢在功能上。在现实生活中我list_of_indices包含数万[lower bound, upper bound]双，该环形绝对是一款高性能的python脚本的瓶颈。

如何解决这个问题，使用numpy的内部功能：像面具，聪明np.einsum，或者跟其他的东西吗？由于我在HPC领域工作，我也担心内存消耗。

没有人有在尊重的性能要求这一问题的答案吗？

来源

2014-12-06 Louis Gallin

您好，欢迎来到SO！请避免在问题中添加与问题无关的信息。另外请考虑阅读解释如何提出好问题的[帮助]。特别是，这个问题似乎更适合于CodeReview站点，而不是SO。 – BartoszKP 2014-12-06 12:59:06

这是很容易误用numpy和膨胀你的脚本的内存足迹与巨大的数组。至少现在写的方式，我发现与SO有关的问题。 – 2014-12-06 18:17:03

即使在技术上关于代码加速，大多数numpy'vectoriztion'问题都是在SO中回答的。 – hpaulj 2014-12-07 08:29:38

这里有一个简单的方法来尝试：保持同样的解决方案结构，你已经有了，这大概工作。只需使存储创建和索引更高效。如果要为大多数指标总结从input许多元素，总和应该采取更多的时间比for循环。例如：

# Put all the indices in a nice efficient structure: 
idxx = np.hstack((np.array(list_of_indices, dtype=np.uint16), 
    np.arange(len(list_of_indices), dtype=np.uint16)[:,None])) 
# Allocate appropriate data type to the precision and range you need, 
# Do it in one go to be time-efficient 
output = np.zeros(len(list_of_indices), dtype=np.float32) 
for idx0, idx1, idxo in idxx: 
    output[idxo] = input[idx0:idx1].sum()

如果len(list_if_indices) > 2**16，使用uint32而非uint16。

来源

2014-12-06 18:05:43

如果：

input是大致相同的长度output或更短
的output值也有类似幅度

...你可以创建你的输入值的cumsum。然后总和变成减法。

cs = np.cumsum(input, dtype=float32) # or float64 if you need it 
loi = np.array(list_of_indices, dtype=np.uint16) 
output = cs[loi[:,1]] - cs[loi[:,0]]

这里的数值是危险的精度损失，如果input拥有大型和微小值运行。那么cumsum可能对您而言不够准确。

来源

2014-12-06 18:26:12

如何迭代切片列表？

回答

相关问题