2012-09-13 301 views
3

我正在使用三维numpy阵列,我将最终执行PCA。我首先将三维阵列平面化为二维,以便我可以计算协方差(然后计算特征值和特征向量)。计算协方差矩阵 - numpy.cov和numpy.dot之间的差异?

在计算协方差矩阵时,我使用numpy.cov和numpy.dot得到了不同的结果。如果我的二维数组是(5,9),我想最终得到一个5x5(即NxN)协方差矩阵。这是我使用numpy.dot得到的。随着numpy.cov,我结束了9×9的协方差矩阵。这不符合我需要的形状,但老实说,我不知道哪一个是正确的。我已经在我研究的例子中看到了用于计算协方差的两种方法。

如果我通过numpy.linalg.eig计算携带numpy.dot和numpy.cov,我显然会得到不同的答案(下面的示例输出中全部打印出来)。所以,在这点上我很困惑,哪种方法是正确的,或者我可能会出错的地方。

以下是带输出的测试代码。谢谢你的帮助。

import numpy as np 

a = np.random.random(((5,3,3))); # example of what real input will look like 

# create 2D flattened version of 3D input array 
d1,d2,d3 = a.shape 
b = np.zeros([d1,d2*d3]) 
for i in range(len(a)): 
    b[i] = a[i].flatten() 

print "shape of 3D array: ", a.shape 
print "shape of flattened 2D array: ", b.shape, "\n" 
print "flattened 2D array:\n", b, "\n" 

# mean-center the flattened array 
b -= np.mean(b, axis=0) 

# calculate the covariance matrix of the flattened array 
covar1 = np.cov(b, rowvar=0) # this makes a 9x9 array 
covar2 = np.dot(b, b.T)  # this makes a 5x5 array 

print "covariance via numpy.cov:\n", covar1, "\n" 
print "covariance via numpy.dot:\n", covar2, "\n" 

# calculate eigenvalues and eigenvectors 
eval1, evec1 = np.linalg.eig(covar1) 
eval2, evec2 = np.linalg.eig(covar2) 

print "eigenvalues via numpy.cov covariance matrix:\n", eval1, "\n" 
print "eigenvectors via numpy.cov covariance matrix:\n", evec1, "\n" 
print "eigenvalues via numpy.dot covariance matrix:\n", eval2, "\n" 
print "eigenvectors via numpy.dot covariance matrix:\n", evec2, "\n" 


======= Output ======= 

shape of 3D array: (5, 3, 3) 
shape of flattened 2D array: (5, 9) 

flattened 2D array: 
[[ 0.94964127 0.71015973 0.80994774 0.49727821 0.38270025 0.89136202 
    0.19876615 0.72461047 0.43646456] 
[ 0.00502329 0.70593521 0.44001479 0.97576486 0.37261107 0.6318449 
    0.86301405 0.21820704 0.91507706] 
[ 0.75411747 0.98462782 0.65109776 0.1083943 0.12867679 0.63172813 
    0.85803498 0.89507165 0.62291308] 
[ 0.88589874 0.02797773 0.6421045 0.17255432 0.5713524 0.28589519 
    0.55888288 0.7961657 0.4453764 ] 
[ 0.85774793 0.19511453 0.92167001 0.27340606 0.41849435 0.98349776 
    0.19354437 0.2974041 0.52064868]] 

covariance via numpy.cov(): 
[[ 0.15180806 -0.04977355 0.05733885 -0.11340765 0.00840097 0.01461576 
    -0.08596712 0.07512366 -0.07509614] 
[-0.04977355 0.15853367 -0.02337953 0.0357429 -0.05604085 0.02600021 
    0.06158462 0.0229808 0.03506849] 
[ 0.05733885 -0.02337953 0.0335786 -0.03485899 0.00294469 0.03209583 
    -0.05378417 0.00490397 -0.02751816] 
[-0.11340765 0.0357429 -0.03485899 0.12340238 0.0052609 0.0144986 
    0.02494029 -0.07492008 0.05109007] 
[ 0.00840097 -0.05604085 0.00294469 0.0052609 0.02529647 -0.01263607 
    -0.02327657 -0.01136774 -0.01037048] 
[ 0.01461576 0.02600021 0.03209583 0.0144986 -0.01263607 0.07415853 
    -0.05387152 -0.0345835 -0.00342481] 
[-0.08596712 0.06158462 -0.05378417 0.02494029 -0.02327657 -0.05387152 
    0.11053971 0.00903926 0.04727671] 
[ 0.07512366 0.0229808 0.00490397 -0.07492008 -0.01136774 -0.0345835 
    0.00903926 0.09436665 -0.03526195] 
[-0.07509614 0.03506849 -0.02751816 0.05109007 -0.01037048 -0.00342481 
    0.04727671 -0.03526195 0.03900974]] 

covariance via numpy.dot(): 
[[ 0.3211555 -0.34304471 -0.01453859 -0.1071505 0.14357829] 
[-0.34304471 1.24506647 -0.11174019 -0.43907983 -0.35120174] 
[-0.01453859 -0.11174019 0.57018674 -0.10412646 -0.3397815 ] 
[-0.1071505 -0.43907983 -0.10412646 0.60465919 0.0456976 ] 
[ 0.14357829 -0.35120174 -0.3397815 0.0456976 0.50170735]] 

eigenvalues via numpy.cov covariance matrix: 
[ 3.34339027e-01 +0.00000000e+00j 1.98268985e-01 +0.00000000e+00j 
    5.71434551e-02 +0.00000000e+00j 1.13399310e-01 +0.00000000e+00j 
    3.38418299e-18 +1.46714498e-17j 3.38418299e-18 -1.46714498e-17j 
    1.20944017e-18 +0.00000000e+00j -8.89005842e-18 +0.00000000e+00j 
    -6.59244508e-18 +0.00000000e+00j] 

eigenvectors via numpy.cov covariance matrix: 
[[-0.33898927+0.j   0.01567746+0.j   -0.32410513+0.j 
    0.01868249+0.j   0.03901578-0.09858459j 0.03901578+0.09858459j 
    -0.17596347+0.j   0.08294235+0.j   0.04883282+0.j  ] 
[ 0.03740184+0.j   -0.01106985+0.j   0.11199662+0.j 
    -0.36257285+0.j   0.66513867+0.j   0.66513867+0.j 
    0.34810753+0.j   -0.05174886+0.j   -0.21147240+0.j  ] 
[ 0.42193056+0.j   0.10153367+0.j   -0.52774125+0.j 
    -0.57292678+0.j   -0.02584078-0.15425679j -0.02584078+0.15425679j 
    -0.02594397+0.j   -0.23132722+0.j   -0.33824532+0.j  ] 
[-0.08723679+0.j   -0.17700647+0.j   -0.04490487+0.j 
    0.14531440+0.j   -0.08669754+0.21485879j -0.08669754-0.21485879j 
    -0.73208352+0.j   0.04474123+0.j   -0.09159437+0.j  ] 
[-0.26991334+0.j   0.39182156+0.j   0.18023454+0.j 
    -0.14727224+0.j   -0.21261400+0.1100362j -0.21261400-0.1100362j 
    0.15211635+0.j   0.54168898+0.j   -0.36386803+0.j  ] 
[-0.39361702+0.j   0.48389127+0.j   0.12668909+0.j 
    0.07739853+0.j   0.31569702-0.34166187j 0.31569702+0.34166187j 
    0.11287735+0.j   -0.74889136+0.j   -0.42472067+0.j  ] 
[-0.29962418+0.j   -0.01577641+0.j   0.35742257+0.j 
    -0.68969822+0.j   -0.28182091+0.13998238j -0.28182091-0.13998238j 
    -0.40124817+0.j   0.06419507+0.j   0.47506061+0.j  ] 
[-0.57032501+0.j   -0.60505095+0.j   -0.30688172+0.j 
    -0.11823642+0.j   0.07618472-0.0915626j 0.07618472+0.0915626j 
    0.32272841+0.j   -0.10872383+0.j   -0.25867852+0.j  ] 
[-0.23498699+0.j   0.45164240+0.j   -0.57569388+0.j 
    0.03856674+0.j   -0.07478874+0.27512969j -0.07478874-0.27512969j 
    -0.10101603+0.j   0.25440413+0.j   0.47403650+0.j  ]] 

eigenvalues via numpy.dot covariance matrix: 
[ 1.33735611e+00 7.93075942e-01 2.08276008e-16 4.53597239e-01 
    2.28573820e-01] 

eigenvectors via numpy.dot covariance matrix: 
[[ 0.1223889 -0.87441162 -0.4472136 -0.13172011 0.05545353] 
[-0.54658696 0.08157704 -0.4472136 0.61361759 0.34360056] 
[ 0.70163289 0.24699239 -0.4472136 0.41717057 -0.26958257] 
[-0.41754523 0.17603863 -0.4472136 -0.33135976 -0.69632398] 
[ 0.1401104 0.36980356 -0.4472136 -0.56770828 0.56685246]] 

回答

4

np.dot只是两个矩阵的矩阵乘积。这不是协变性。你为什么使用rowvar=0?如果你只是做np.cov(b)它给出了正确尺寸的矩阵。

+0

Aaargh ......在代码一直盯着太久......你是对的,当然。所述rowvar = 0是从当我试图处理3D阵列的后遗症。谢谢。 RE:使用np.dot计算协方差,中的一个例子是在这里:http://www.janeriksolem.net/2009/01/pca-for-images-using-python.html – vulture