2017-06-21 45 views
2

我正在练习贷款预测练习问题并尝试填充数据中的缺失值。我从here获得数据。为了完成这个问题,我正在关注这个tutorialValueError:输入包含NaN,无穷大或者对于dtype('float64')来说值太大

您可以找到我使用的整个代码(文件名称model.py)和关于GitHub的数据。

数据框看起来是这样的:

执行的最后一行后(相当于在model.py文件到线122)

/home/user/.local/lib/python2.7/site-packages/numpy/lib/arraysetops.py:216: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change. 
    flag = np.concatenate(([True], aux[1:] != aux[:-1])) 
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. 
    "This module will be removed in 0.20.", DeprecationWarning) 
Traceback (most recent call last): 
    File "model.py", line 123, in <module> 
    classification_model(model, df,predictor_var,outcome_var) 
    File "model.py", line 89, in classification_model 
    model.fit(data[predictors],data[outcome]) 
    File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1173, in fit 
    order="C") 
    File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 521, in check_X_y 
    ensure_min_features, warn_on_dtype, estimator) 
    File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 407, in check_array 
    _assert_all_finite(array) 
    File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 58, in _assert_all_finite 
    " or a value too large for %r." % X.dtype) 
ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). 

我得到这个错误,因为的缺失值。我如何填写这些缺失的值?

缺少的值Self_EmployedLoanAmount充满我怎么填rest.Thank你的帮助。

+0

是否要用标量值替换'NaN'?然后使用'df ['colname']。fillna(val,inplace = True)' – jezrael

+0

@jezrael关于性别和结婚等对象类型怎么样 – Aniruddh

+0

我认为最好的方法是将其重新赋值给'df ['Gender']。 fillna('no data',inplace = True)'和'df [Married']。fillna('no data',inplace = True)' – jezrael

回答

1

您可以使用fillna

df['Gender'].fillna('no data',inplace=True) 
df['Married'].fillna('no data',inplace=True) 

或者,如果需要更换多个列相同的值:

cols = ['Gender','Married'] 
df[cols] = df[cols].fillna('no data') 

如果需要更换多个列,可以使用dict列名和值进行更换:

df = pd.DataFrame({'Gender':['m','f',np.nan], 
        'Married':[np.nan,'yes','no'], 
        'credit history':[1.,np.nan,0]}) 
print (df) 
    Gender Married credit history 
0  m  NaN    1.0 
1  f  yes    NaN 
2 NaN  no    0.0 

d = {'Gender':'no data', 'Married':'no data', 'credit history':0} 
df = df.fillna(d) 
print (df) 
    Gender Married credit history 
0  m no data    1.0 
1  f  yes    0.0 
2 no data  no    0.0 
+0

df ['Gender']。fillna('no data',inplace = True)只替换缺失的性别值正确它不会影响其余的值 – Aniruddh

+0

是的,确切地说。你是对的。 – jezrael

+0

和一个更多noob怀疑填充数字值,如credit_history和dependents值应该在引号 – Aniruddh

相关问题