2013-11-23 54 views
3

我需要此功能进行优化,因为我试图让我的OpenGL模拟运行速度更快。我想使用Parakeet,但我无法完全理解我需要如何修改下面的代码才能这样做。你能看到我应该做什么吗?使用Parakeet优化Python函数

def distanceMatrix(self,x,y,z): 
    " ""Computes distances between all particles and places the result in a matrix such that the ij th matrix entry corresponds to the distance between particle i and j"" " 
    xtemp = tile(x,(self.N,1)) 
    dx = xtemp - xtemp.T 
    ytemp = tile(y,(self.N,1)) 
    dy = ytemp - ytemp.T 
    ztemp = tile(z,(self.N,1)) 
    dz = ztemp - ztemp.T 

    # Particles 'feel' each other across the periodic boundaries 
    if self.periodicX: 
     dx[dx>self.L/2]=dx[dx > self.L/2]-self.L 
     dx[dx<-self.L/2]=dx[dx < -self.L/2]+self.L 
    if self.periodicY: 
     dy[dy>self.L/2]=dy[dy>self.L/2]-self.L 
     dy[dy<-self.L/2]=dy[dy<-self.L/2]+self.L 
    if self.periodicZ: 
     dz[dz>self.L/2]=dz[dz>self.L/2]-self.L 
     dz[dz<-self.L/2]=dz[dz<-self.L/2]+self.L 

    # Total Distances 
    d = sqrt(dx**2+dy**2+dz**2) 

    # Mark zero entries with negative 1 to avoid divergences 
    d[d==0] = -1 

    return d, dx, dy, dz 

从我可以告诉,鹦鹉应该能够使用上面的功能没有改变 - 它仅使用NumPy的和数学。但是,我总是从鹦鹉JIT包装调用函数时出现以下错误:

AssertionError: Unsupported function: <bound method Particles.distanceMatrix of <particles.Particles instance at 0x04CD8E90>> 
+1

你也可以做的一种可能的优化方法是用一个局部变量替换'self.L/2'的12个用法。这可能会做点什么。 :-)(虽然这对鹦鹉问题没有帮助......抱歉) –

+1

我猜布尔索引不支持,但我不确定。我知道分配像xtemp和dx L/2这样的中间数组是浪费的。尽管你的向量化代码可能是常规numpy中的快速代码,但使用JIT编译器,最好使用钝的for-loops! – 2013-11-24 14:49:55

回答

4

鹦鹉还很年轻,其NumPy的支持是不完整的,并且你的代码倒是几个特点,还不能正常工作。

1)你正在包装一个方法,而Parakeet到目前为止只知道如何处理函数。常用的解决方法是创建一个@jit包装的帮助函数,并将您的方法调用到所有必需的成员数据中。方法不起作用的原因是将有意义的类型分配给“自我”是不平凡的。这不是不可能的,但棘手的是,方法不会进入长尾鹦鹉,直到下垂果实被采摘为止。说起低挂的水果......

2)布尔索引。尚未实施,但将在下一个版本中发布。

3)np.tile:也行不通,也可能会在下一个版本。如果您想查看哪些内置函数和NumPy库函数可以工作,请查看Parakeet的mappings模块。

我重写你的代码是一个小友好给鹦鹉:

@jit 
def parakeet_dist(x, y, z, L, periodicX, periodicY, periodicZ): 
    # perform all-pairs computations more explicitly 
    # instead of tile + broadcasting 
    def periodic_diff(x1, x2, periodic): 
    diff = x1 - x2 
    if periodic: 
     if diff > (L/2): diff -= L 
     if diff < (-L/2): diff += L 
    return diff 
    dx = np.array([[periodic_diff(x1, x2, periodicX) for x1 in x] for x2 in x]) 
    dy = np.array([[periodic_diff(y1, y2, periodicY) for y1 in y] for y2 in y]) 
    dz = np.array([[periodic_diff(z1, z2, periodicZ) for z1 in z] for z2 in z]) 
    d= np.sqrt(dx**2 + dy**2 + dz**2) 

    # since we can't yet use boolean indexing for masking out zero distances 
    # have to fall back on explicit loops instead 
    for i in xrange(len(x)): 
    for j in xrange(len(x)): 
     if d[i,j] == 0: d[i,j] = -1 
    return d, dx, dy, dz 

在我的机器本只运行〜3倍的速度比NumPy的为N = 2000(0.39s为NumPy的对比0.14s的鹦鹉) 。如果我重写遍历数组更明确地使用循环则表现上升到6倍〜比NumPy的更快(鹦鹉运行在〜0.06S):

@jit 
def loopy_dist(x, y, z, L, periodicX, periodicY, periodicZ): 
    N = len(x) 
    dx = np.zeros((N,N)) 
    dy = np.zeros((N,N)) 
    dz = np.zeros((N,N)) 
    d = np.zeros((N,N)) 

    def periodic_diff(x1, x2, periodic): 
    diff = x1 - x2 
    if periodic: 
     if diff > (L/2): diff -= L 
     if diff < (-L/2): diff += L 
    return diff 

    for i in xrange(N): 
    for j in xrange(N): 
     dx[i,j] = periodic_diff(x[j], x[i], periodicX) 
     dy[i,j] = periodic_diff(y[j], y[i], periodicY) 
     dz[i,j] = periodic_diff(z[j], z[i], periodicZ) 
     d[i,j] = dx[i,j] ** 2 + dy[i,j] ** 2 + dz[i,j] ** 2 
     if d[i,j] == 0: d[i,j] = -1 
     else: d[i,j] = np.sqrt(d[i,j]) 
    return d, dx, dy, dz 

随着一点点的创意重写你也可以得到上面的代码在Numba运行,但它只比NumPy(0.25秒)快1.5倍。编译时间是Parakeet w/comprehensions:1秒,Parakeet w /循环:0.5秒,Numba w /循环:0.9秒。

希望接下来的几个版本能够更多地使用NumPy库函数,但现在理解或循环往往是要走的路。

+1

谢谢!尽管你也在reddit上回应了这个问题,但我恢复了这个帐户以提高你的答案。迟到总比不到好。 – bogdan