2016-07-22 22 views
-1

我使用的是Spectral Clustering Library,相似矩阵是它的主要参数。我矩阵是这样的:Sklearn谱矩阵中inf或NaNs的聚类错误

[[ 1.00000000e+00 8.47085137e-01 8.49644498e-01 8.49746438e-01 
2.96473454e-01 8.50540412e-01 8.49462072e-01 8.50839475e-01 
8.45951343e-01 5.76448265e-01 8.48265736e-01 8.43378943e-01 
3.75348067e-01 1.17626480e-01 2.50357519e-01 8.50495202e-01 
9.97541755e-01 8.49835674e-01 8.48770171e-01 8.45869271e-01 
-5.97205241e-02] 
[ 8.47085137e-01 1.00000000e+00 9.98547894e-01 9.98803332e-01 
2.22305018e-01 9.98755219e-01 9.98502380e-01 9.98402601e-01 
9.98778885e-01 5.66416311e-01 9.98639207e-01 9.98452172e-01 
-6.10479042e-02 2.46741344e-02 -4.14116930e-03 9.98357419e-01 
8.48955204e-01 9.98525354e-01 9.98900440e-01 9.98426618e-01 
-6.51839614e-02] 
[ 8.49644498e-01 9.98547894e-01 1.00000000e+00 9.98764222e-01 
1.59017501e-01 9.98777492e-01 9.98797005e-01 9.98756310e-01 
9.98785822e-01 5.71955127e-01 9.98834038e-01 9.98652820e-01 
-5.95467715e-02 1.98107829e-02 -3.88527970e-03 9.98810942e-01 
8.51337460e-01 9.98882675e-01 9.98815975e-01 9.98789494e-01 
-6.69662309e-02] 
[ 8.49746438e-01 9.98803332e-01 9.98764222e-01 1.00000000e+00 
4.73518047e-01 9.98684853e-01 9.98839959e-01 9.99029920e-01 
9.98804479e-01 5.67855583e-01 9.98759386e-01 9.98796277e-01 
-6.07517782e-02 1.71388383e-02 -3.20996100e-03 9.98669121e-01 
8.51600753e-01 9.98681806e-01 9.99072484e-01 9.98702177e-01 
-6.29855810e-02] 
[ 3.52784328e-01 2.41076867e-01 2.01621082e-01 4.11538647e-01 
9.92999574e-01 2.09351787e-01 2.12464918e-01 1.84566399e-01 
2.82162287e-01 8.88835155e-01 1.90613041e-01 2.12150578e-01 
2.92104260e-01 6.25221827e-02 8.70607365e-01 2.88645877e-01 
3.09283827e-01 2.81253950e-01 1.80307149e-01 2.49082955e-01 
5.46192492e-02] 
... 
[ -5.97205241e-02 -6.51839614e-02 -6.69662309e-02 -6.29855810e-02 
7.86918277e-02 -6.49002943e-02 -6.12003747e-02 -6.34500592e-02 
-6.75593439e-02 7.23869691e-02 -6.20686862e-02 -5.94039824e-02 
-1.00101778e-01 -1.14667128e-01 5.57606897e-02 -6.32884559e-02 
-5.33734526e-02 -5.90822523e-02 -6.17068052e-02 -5.76615359e-02 
1.00000000e+00]] 

而且我的代码类似的文件样本:

cl = SpectralClustering(n_clusters=4,affinity='precomputed') 
y = cl.fit_predict(matrix) 

但出现下列错误:

/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/utils/validation.py:629: UserWarning: Array is not symmetric, and will be converted to symmetric by average with its transpose. 
    warnings.warn("Array is not symmetric, and will be converted " 

/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/utils/graph.py:172: RuntimeWarning: invalid value encountered in sqrt 
    w = np.sqrt(w) 

Traceback (most recent call last): 

File "/home/mahmood/PycharmProjects/sentence2vec/graphClustering.py", line 23, in <module> 
    y = cl.fit_predict(matrix) 

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/base.py", line 371, in fit_predict 
    self.fit(X) 

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/spectral.py", line 454, in fit 
    assign_labels=self.assign_labels) 

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/spectral.py", line 258, in spectral_clustering 
    eigen_tol=eigen_tol, drop_first=False) 

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/manifold/spectral_embedding_.py", line 254, in spectral_embedding 
    tol=eigen_tol) 

File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1545, in eigsh 
    symmetric=True, tol=tol) 

File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1033, in get_OPinv_matvec 
    return LuInv(A).matvec 

File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/interface.py", line 142, in __new__ 
    obj.__init__(*args, **kwargs) 

File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 922, in __init__ 
    self.M_lu = lu_factor(M) 

File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_lu.py", line 58, in lu_factor 
    a1 = asarray_chkfinite(a) 

File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1022, in asarray_chkfinite 

"array must not contain infs or NaNs") 
ValueError: array must not contain infs or NaNs 

首先警告是可以接受的,因为矩阵不是对称的,但矩阵中没有inf或NaN。

+0

如何检查矩阵是否包含NaN/Inf值? – ahajib

+0

我把印刷矩​​阵问题,似乎没有NaN/Inf值。 –

+0

这只是你的矩阵的一部分,不准确。你必须检查它的每一个元素,以确保。 – ahajib

回答

0

NaN值出现因为你的矩阵是不是一个相似矩阵:数据中包含负面的相似之处!当采取这些值的sqrt时,您会得到NaN,因此是错误。

警告不仅仅是为了好玩 - 矩阵分解技术有许多要求,允许它们工作并返回有意义的结果。

第一次修复您的负面相似性,然后重试。

+0

我告诉过你,但第一个警告消失了。第二个警告和其他错误保持强烈! –

+0

你也不能有负值。 'SQRT(-1)= nan' –