我对具有相当稀疏特征的相当数据集应用光谱聚类(sklearn.cluster.SpectralClustering
)。当在Python中谱聚类,我得到以下警告:稀疏数据集上的光谱聚类
UserWarning: Graph is not fully connected, spectral embedding may not work as expected. warnings.warn("Graph is not fully connected, spectral embedding"
这之后通常是这样的一个错误:
`
File "****.py", line 120, in perform_clustering_spectral_clustering
predicted_clusters = cluster.SpectralClustering(n_clusters=n).fit_predict(features)
File "****\sklearn\base.py", line 349, in fit_predict
self.fit(X)
File "****\sklearn\cluster\spectral.py", line 450, in fit
assign_labels=self.assign_labels)
File "****\sklearn\cluster\spectral.py", line 256, in spectral_clustering
eigen_tol=eigen_tol, drop_first=False)
File "****\sklearn\manifold\spectral_embedding_.py", line 297, in spectral_embedding
largest=False, maxiter=2000)
File "****\scipy\sparse\linalg\eigen\lobpcg\lobpcg.py", line 462, in lobpcg
activeBlockVectorBP, retInvR=True)
File "****\scipy\sparse\linalg\eigen\lobpcg\lobpcg.py", line 112, in _b_orthonormalize
gramVBV = cholesky(gramVBV)
File "****\scipy\linalg\decomp_cholesky.py", line 81, in cholesky
check_finite=check_finite)
File "****\scipy\linalg\decomp_cholesky.py", line 30, in _cholesky
raise LinAlgError("%d-th leading minor not positive definite" % info)
numpy.linalg.linalg.LinAlgError: 9-th leading minor not positive definite
numpy.linalg.linalg.LinAlgError: 9-th leading minor not positive definite
numpy.linalg.linalg.LinAlgError: the leading minor of order 12 of 'b' is not positive definite. The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.`
然而,并不总是发生此警告/错误当使用相同的设置时(例如,它的行为不是很一致,因此很难测试)。它发生在n_clusters的不同值上,但对于n = 2和n> 7的值更经常发生(这是我至少短暂的经历;正如我所提到的,它的行为并不一致)。
我应该如何处理这个警告和相关的错误?它取决于功能的数量?如果我添加更多?
我假设你使用'sklearn.cluster.SpectralClustering'?你真的需要在问题中提到这一点。此外,请显示错误和警告的完整回溯,而不仅仅是最后一行。 –
您的稀疏相似矩阵*正定* *? –
我使用所需信息编辑了帖子。该矩阵可能不是肯定的(因为这是错误说的)。问题是如何应对呢? – Guido