3D距离矢量化

我需要帮助矢量化此代码。现在，N = 100，运行需要一分钟左右的时间。我想加快速度。我已经做了这样的双循环，但从来没有一个3D循环，我有困难。3D距离矢量化

import numpy as np 
N = 100 
n = 12 
r = np.sqrt(2) 

x = np.arange(-N,N+1) 
y = np.arange(-N,N+1) 
z = np.arange(-N,N+1) 

C = 0 

for i in x: 
    for j in y: 
     for k in z: 
      if (i+j+k)%2==0 and (i*i+j*j+k*k!=0): 
       p = np.sqrt(i*i+j*j+k*k) 
       p = p/r 
       q = (1/p)**n 
       C += q 

print '\n' 
print C

来源

2014-02-17 NightHallow

“我有困难”没有帮助 - 您是否收到错误（提供追溯）？意想不到的产出（提供投入，预期产出，实际产出）？ – jonrsharpe

@jonrsharpe：你看过这个问题吗？代码是正确的，但速度很慢，所以OP想把一些工作推到快速的numpy库中，而不是在慢Python中执行循环。 –

是的，代码是正确的，我只是想要它的矢量化。我想学习如何使它尽可能高效 – NightHallow

meshgrid/where/indexing解决方案已经非常快。我做了大约65％的速度。这不是太多，但我无论如何一步一步解释：

对于我来说，解决这个问题最简单的方法是将网格中的所有3D向量作为一个大的2D 3 x M数组中的列。 meshgrid是创建所有组合的正确工具（请注意，3D网格网格需要numpy version> = 1.7），并且vstack + reshape将数据转换为所需的形式。例如：

>>> np.vstack(np.meshgrid(*[np.arange(0, 2)]*3)).reshape(3,-1) 
array([[0, 0, 1, 1, 0, 0, 1, 1], 
     [0, 0, 0, 0, 1, 1, 1, 1], 
     [0, 1, 0, 1, 0, 1, 0, 1]])

每列是一个3D矢量。这八个矢量中的每一个表示一个1x1x1立方体的一个角（在所有维中具有步长1和长度1的3D网格）。

我们称这个数组为vectors（它包含表示网格中所有点的所有3D向量）。然后，准备一个bool掩模用于选择那些满足您MOD2标准载体：

mod2bool = np.sum(vectors, axis=0) % 2 == 0

np.sum(vectors, axis=0)创建1 x M阵列包含用于每个列向量的元素之和。因此，mod2bool是一个1 x M数组，每个列向量具有一个bool值。现在用这个布尔面膜：

vectorsubset = vectors[:,mod2bool]

这将选择所有行（:)，并使用布尔索引用于过滤列，都是在numpy的快速操作。计算剩余向量的长度，使用本机numpy的方法：

lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))

这是相当快的 - 但是，scipy.stats.ss和bottleneck.ss可以速度甚至比这个执行平方和运算。由零

with np.errstate(divide='ignore'): 
     p = (r/lengths)**n

这涉及有限数量的划分，从而导致Inf S中的输出数组中：

使用你的指令变换长度。这完全没问题。我们使用numpy的errstate上下文管理器来确保这些零分区不会抛出异常或运行时警告。

现在总结有限元（忽略的INF），并返回的总和：

return np.sum(p[np.isfinite(p)])

我在下面两次实现此方法。一旦完全像刚刚解释的那样，并且一旦涉及瓶颈的ss和nansum函数。我还添加了用于比较的方法，以及跳过np.where((x*x+y*y+z*z)!=0)索引的方法的修改版本，而是创建了Inf s，最后总结了isfinite的方法。

import sys 
import numpy as np 
import bottleneck as bn 

N = 100 
n = 12 
r = np.sqrt(2) 


x,y,z = np.meshgrid(*[np.arange(-N, N+1)]*3) 
gridvectors = np.vstack((x,y,z)).reshape(3, -1) 


def measure_time(func): 
    import time 
    def modified_func(*args, **kwargs): 
     t0 = time.time() 
     result = func(*args, **kwargs) 
     duration = time.time() - t0 
     print("%s duration: %.3f s" % (func.__name__, duration)) 
     return result 
    return modified_func 


@measure_time 
def method_columnvecs(vectors): 
    mod2bool = np.sum(vectors, axis=0) % 2 == 0 
    vectorsubset = vectors[:,mod2bool] 
    lengths = np.sqrt(np.sum(vectorsubset**2, axis=0)) 
    with np.errstate(divide='ignore'): 
     p = (r/lengths)**n 
    return np.sum(p[np.isfinite(p)]) 


@measure_time 
def method_columnvecs_opt(vectors): 
    # On my system, bn.nansum is even slightly faster than np.sum. 
    mod2bool = bn.nansum(vectors, axis=0) % 2 == 0 
    # Use ss from bottleneck or scipy.stats (axis=0 is default). 
    lengths = np.sqrt(bn.ss(vectors[:,mod2bool])) 
    with np.errstate(divide='ignore'): 
     p = (r/lengths)**n 
    return bn.nansum(p[np.isfinite(p)]) 


@measure_time 
def method_original(x,y,z): 
    ind = np.where((x+y+z)%2==0) 
    x = x[ind] 
    y = y[ind] 
    z = z[ind] 
    ind = np.where((x*x+y*y+z*z)!=0) 
    x = x[ind] 
    y = y[ind] 
    z = z[ind] 
    p=np.sqrt(x*x+y*y+z*z)/r 
    return np.sum((1/p)**n) 


@measure_time 
def method_original_finitesum(x,y,z): 
    ind = np.where((x+y+z)%2==0) 
    x = x[ind] 
    y = y[ind] 
    z = z[ind] 
    lengths = np.sqrt(x*x+y*y+z*z) 
    with np.errstate(divide='ignore'): 
     p = (r/lengths)**n 
    return np.sum(p[np.isfinite(p)]) 


print method_columnvecs(gridvectors) 
print method_columnvecs_opt(gridvectors) 
print method_original(x,y,z) 
print method_original_finitesum(x,y,z)

这是输出：

$ python test.py 
method_columnvecs duration: 1.295 s 
12.1318801965 
method_columnvecs_opt duration: 1.162 s 
12.1318801965 
method_original duration: 1.936 s 
12.1318801965 
method_original_finitesum duration: 1.714 s 
12.1318801965

的所有方法产生相同的结果。在执行isfinite样式总和时，您的方法会变得更快一些。我的方法更快，但我会说这是一个学术性质的练习，而不是一个重要的改进:-)

我还有一个问题：你是说，对于N = 3，计算应该产生一个12甚至你的也不会这样做。以上所有方法对于N = 3产生12.1317530867。这是预期的吗？

来源

2014-02-19 14:54:28

我认为我的意思是N = 1应该给12，我只用于测试用例，看看我的东西是否正确。非常感谢您的意见和帮助。我学到了很多东西，我甚至都没有想过。谢谢！ – NightHallow

感谢@Bill，我能够得到这个工作。现在非常快。也许可以做得更好，特别是用两个面具来摆脱我最初为循环所用的两个条件。

from __future__ import division 
    import numpy as np 

    N = 100 
    n = 12 
    r = np.sqrt(2) 

    x, y, z = np.meshgrid(*[np.arange(-N, N+1)]*3) 

    ind = np.where((x+y+z)%2==0) 
    x = x[ind] 
    y = y[ind] 
    z = z[ind] 
    ind = np.where((x*x+y*y+z*z)!=0) 
    x = x[ind] 
    y = y[ind] 
    z = z[ind] 

    p=np.sqrt(x*x+y*y+z*z)/r 

    ans = (1/p)**n 
    ans = np.sum(ans) 
    print 'ans' 
    print ans

来源

2014-02-17 19:07:42 NightHallow

我对这些条件'（x + y + z）％2 == 0'和' （x * x + y * y + z * z）！= 0'来自。任何物理意义？后者只是一个绕零工作的过滤器？ numpy可以处理这个问题。 –

Just FYI，'距离= distance.pdist（向量，'sqeuclidean'）'可能显着加快你的'np.sqrt（x * x + y * y + z * z）'，所以你可能想在过滤之前这样做。 –

如果我把它放在过滤之前，那么我想怎么过滤呢？而且，这些条件来自我正在观察的原子晶格的物理意义。它是固体物理学，这些是我用于格点的条件，格点设置为只有原子，如果坐标偶数相加，原点上没有，因为它是参考原子。 – NightHallow

回答

相关问题