2016-04-21 43 views
1

我试图实现PCA,这对中间结果(如特征值和特征向量)运行良好。然而,当我尝试将数据(3维)投影到二维主成分空间时,结果是错误的。 我花了很多时间我的代码比较其他的实现,例如:Python PCA - 投影到较低空间空间

http://sebastianraschka.com/Articles/2014_pca_step_by_step.html

然而,很长一段时间后,就没有进步,我不能发现其中的错误。由于正确的中间结果,我认为这个问题是一个简单的编码错误。 预先感谢所有真正阅读此问题的人,并且感谢那些提供有用评论/答案的人。

我的代码如下:

import numpy as np 

class PCA(): 
def __init__(self, X):   
    #center the data   
    X = X - X.mean(axis=0)   
    #calculate covariance matrix based on X where data points are represented in rows 
    C = np.cov(X, rowvar=False)  
    #get eigenvectors and eigenvalues 
    d,u = np.linalg.eigh(C)   
    #sort both eigenvectors and eigenvalues descending regarding the eigenvalue 
    #the output of np.linalg.eigh is sorted ascending, therefore both are turned around to reach a descending order 
    self.U = np.asarray(u).T[::-1]  
    self.D = d[::-1] 

**problem starts here**  

def project(self, X, m): 
    #use the top m eigenvectors with the highest eigenvalues for the transformation matrix 
    Z = np.dot(X,np.asmatrix(self.U[:m]).T) 
    return Z 

我的代码的结果是:

myresult 
([[ 0.03463706, -2.65447128], 
    [-1.52656731, 0.20025725], 
    [-3.82672364, 0.88865609], 
    [ 2.22969475, 0.05126909], 
    [-1.56296316, -2.22932369], 
    [ 1.59059825, 0.63988429], 
    [ 0.62786254, -0.61449831], 
    [ 0.59657118, 0.51004927]]) 

correct result - such as by sklearn.PCA 
([[ 0.26424835, -2.25344912], 
[-1.29695602, 0.60127941], 
[-3.59711235, 1.28967825], 
[ 2.45930604, 0.45229125], 
[-1.33335186, -1.82830153], 
[ 1.82020954, 1.04090645], 
[ 0.85747383, -0.21347615], 
[ 0.82618248, 0.91107143]]) 

The input is defined as follows: 
X = np.array([ 
[-2.133268233289599,0.903819474847349,2.217823388231679,-0.444779660856219,-0.661480010318842,-0.163814281248453,-0.608167714051449, 0.949391996219125], 
[-1.273486742804804,-1.270450725314960,-2.873297536940942, 1.819616794091556,-2.617784834189455, 1.706200163080549,0.196983250752276,0.501491995499840], 
[-0.935406638147949,0.298594472836292,1.520579082270122,-1.390457671168661,-1.180253547776717,-0.194988736923602,-0.645052874385757,-1.400566775105519]]).T 

回答

3

你需要减去意味着你其投影到新的基础之前,中心数据:

mu = X.mean(0) 
C = np.cov(X - mu, rowvar=False) 
d, u = np.linalg.eigh(C) 
U = u.T[::-1] 
Z = np.dot(X - mu, U[:2].T) 

print(Z) 
# [[ 0.26424835 -2.25344912] 
# [-1.29695602 0.60127941] 
# [-3.59711235 1.28967825] 
# [ 2.45930604 0.45229125] 
# [-1.33335186 -1.82830153] 
# [ 1.82020954 1.04090645] 
# [ 0.85747383 -0.21347615] 
# [ 0.82618248 0.91107143]]