2017-05-13 126 views
1

运行得比较慢的行方向的点积给出一个矩阵具有形状(N,K)和向量​​大小Ñ小号,我要计算矩阵ģ具有形状(K,K)如下:numpy的“向量化” for循环

G + = S [I] * A [I] .T * A [I],对于所有{ 0,...,n-1}

我试图实现,使用一个for循环(方法1),并在矢量方式(方法2),但对于循环实现更快ķ(特别是当k的较大值> 500)。

该代码被写为如下:

import numpy as np 
k = 200 
n = 50000 
A = np.random.randint(0, 1000, (n,k)) # generates random data for the matrix A (n,k) 
G1 = np.zeros((k,k)) # initialize G1 as a (k,k) matrix 
s = np.random.randint(0, 1000, n) * 1.0 # initialize a random vector of size n 

# METHOD 1 
for i in xrange(n): 
    G1 += s[i] * np.dot(np.array([A[i]]).T, np.array([A[i]])) 

# METHOD 2 
G2 = np.dot(A[:,np.newaxis].T, s[:,np.newaxis]*A) 
G2 = np.squeeze(G2) # reduces dimension from (k,1,k) to (k,k) 

的矩阵G1和G2是相同的(它们是基质ģ),并且唯一的差别是它们是如何计算的。有没有更聪明有效的方法来计算?

最后,这些都是我随机尺寸得到了ķñ时代:

Test #: 1 
k,n: (866, 45761) 
Method1: 337.457569838s 
Method2: 386.290487051s 
-------------------- 
Test #: 2 
k,n: (690, 48011) 
Method1: 152.999140978s 
Method2: 226.080267191s 
-------------------- 
Test #: 3 
k,n: (390, 5317) 
Method1: 5.28722500801s 
Method2: 4.86999702454s 
-------------------- 
Test #: 4 
k,n: (222, 5009) 
Method1: 1.73456382751s 
Method2: 0.929286956787s 
-------------------- 
Test #: 5 
k,n: (915, 16561) 
Method1: 101.782826185s 
Method2: 159.167108059s 
-------------------- 
Test #: 6 
k,n: (130, 11283) 
Method1: 1.53138184547s 
Method2: 0.64450097084s 
-------------------- 
Test #: 7 
k,n: (57, 37863) 
Method1: 1.44776391983s 
Method2: 0.494270086288s 
-------------------- 
Test #: 8 
k,n: (110, 34599) 
Method1: 3.51851701736s 
Method2: 1.61688089371s 

回答

5

两个更加改进的版本是 -

(A.T*s).dot(A) 
(A.T).dot(A*s[:,None]) 

问题(S)与method2

method2,我们正在创建A[:,np.newaxis].T,其形状为(k,1,n),这是一个3D阵列。我认为3D数组,np.dot进入某种循环,并没有真正的矢量化(源代码可以在这里揭示更多的信息)。

对于这样的3D张量乘法,最好使用张量当量:np.tensordot。因此,method2改进的版本就变成了:

G2 = np.tensordot(A[:,np.newaxis].T, s[:,np.newaxis]*A, axes=((2),(0))) 
G2 = np.squeeze(G2) 

因为,我们是sum-reducing只是一个从每个这些输入与np.tensordot轴,我们并不真正需要tensordot在这里和在squeezed-in版本只是np.dot就足够了。这将导致我们回到。

运行测试

途径 -

def method1(A, s): 
    G1 = np.zeros((k,k)) # initialize G1 as a (k,k) matrix 
    for i in xrange(n): 
     G1 += s[i] * np.dot(np.array([A[i]]).T, np.array([A[i]])) 
    return G1 

def method2(A, s): 
    G2 = np.dot(A[:,np.newaxis].T, s[:,np.newaxis]*A) 
    G2 = np.squeeze(G2) # reduces dimension from (k,1,k) to (k,k) 
    return G2 

def method3(A, s): 
    return (A.T*s).dot(A) 

def method4(A, s): 
    return (A.T).dot(A*s[:,None]) 

def method2_improved(A, s): 
    G2 = np.tensordot(A[:,np.newaxis].T, s[:,np.newaxis]*A, axes=((2),(0))) 
    G2 = np.squeeze(G2) 
    return G2 

时序和验证 -

In [56]: k = 200 
    ...: n = 5000 
    ...: A = np.random.randint(0, 1000, (n,k)) 
    ...: s = np.random.randint(0, 1000, n) * 1.0 
    ...: 

In [72]: print np.allclose(method1(A, s), method2(A, s)) 
    ...: print np.allclose(method1(A, s), method3(A, s)) 
    ...: print np.allclose(method1(A, s), method4(A, s)) 
    ...: print np.allclose(method1(A, s), method2_improved(A, s)) 
    ...: 
True 
True 
True 
True 

In [73]: %timeit method1(A, s) 
    ...: %timeit method2(A, s) 
    ...: %timeit method3(A, s) 
    ...: %timeit method4(A, s) 
    ...: %timeit method2_improved(A, s) 
    ...: 
1 loops, best of 3: 1.12 s per loop 
1 loops, best of 3: 693 ms per loop 
100 loops, best of 3: 8.12 ms per loop 
100 loops, best of 3: 8.17 ms per loop 
100 loops, best of 3: 8.28 ms per loop