2016-03-03 72 views
1

我试图模拟双聚类在阵列中不能包含的INF或NaN的,但它的失败,因为它说数组包含infsnans,虽然我扫描使用pd.isnull(DataFile).sum()ValueError异常:双聚类

import pandas as pd 
import numpy as np 
from matplotlib import pyplot as plt 
from sklearn.datasets import samples_generator as sg 
from sklearn.cluster.bicluster import SpectralCoclustering 
from sklearn.metrics import consensus_score 
DataFile=pd.read_csv("DatafilledProp.csv",sep='\t') 


DataFile.drop(DataFile.columns[[0, 1]], axis=1, inplace=True) 
plt.matshow(DataFile.as_matrix(), cmap=plt.cm.Blues) 
plt.title("Original TransMapping") 
data, row_idx, col_idx = sg._shuffle(DataFile.as_matrix(), random_state=0) 
plt.matshow(data, cmap=plt.cm.Blues) 
plt.title("Shuffled dataset") 
plt.show() 
Features=DataFile.values 
model = SpectralCoclustering(n_clusters=10, random_state=0) 
model.fit(Features) 

数组这是错误,我得到:

File "C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\Extensio 
ns\Microsoft\Python Tools for Visual Studio\2.1\visualstudio_py_util.py", line 1 06, in exec_file 
exec_code(code, file, global_variables) 
     File "C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\Extensio 
    ns\Microsoft\Python Tools for Visual Studio\2.1\visualstudio_py_util.py", line 8 
    2, in exec_code 
     exec(code_obj, global_variables) 
     File "D:\ClusteringDemo\DataPreparation.py\DataPreparation.py\Model.py", line 
    19, in <module> 
     model.fit(Features) 
     File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages 
    \sklearn\cluster\bicluster\spectral.py", line 126, in fit 
     self._fit(X) 
     File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages 
    \sklearn\cluster\bicluster\spectral.py", line 275, in _fit 
     u, v = self._svd(normalized_data, n_sv, n_discard=1) 
     File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages 
    \sklearn\cluster\bicluster\spectral.py", line 139, in _svd 
     **kwargs) 
     File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages 
    \sklearn\utils\extmath.py", line 299, in randomized_svd 
     Q = randomized_range_finder(M, n_random, n_iter, random_state) 
     File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages 
    \sklearn\utils\extmath.py", line 226, in randomized_range_finder 
     Q, R = linalg.qr(Y, mode='economic') 
     File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages 
    \scipy\linalg\decomp_qr.py", line 127, in qr 
     a1 = numpy.asarray_chkfinite(a) 
     File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages 
    \numpy\lib\function_base.py", line 613, in asarray_chkfinite 
     "array must not contain infs or NaNs") 
    ValueError: array must not contain infs or NaNs 
    Press any key to continue . 

回答

0

pd.isnull(DataFile).sum()只为NaN值的检查,因为这样的:

import pandas as pd 

df = pd.DataFrame([[1,2],[3,4],[np.NaN,6]]) 

df 
Out[12]: 
    0 1 
0 1 2 
1 3 4 
2 NaN 6 

pd.isnull(df).sum() 
Out[13]: 
0 1 
1 0 
dtype: int64 

但它不会检查inf,根据错误它是一种可能性。

df3 = pd.DataFrame([[1,2],[3,4],[np.inf,6]]) 

pd.isnull(df3).sum() 
Out[23]: 
0 0 
1 0 
dtype: int64 

因此,我怀疑的错误是inf而非NaN

import numpy as np 

np.isinf(df3).sum() 
Out[25]: 
0 1 
1 0 
dtype: int64