无法从数据帧

所以淘汰的NaN行，我试图清理含有一些楠数据帧值无法从数据帧

我尝试了所有的建议的方法，但好像我无法摆脱的NaN的。

df = pd.read_csv('filename.tsv', delimiter='\t') 
df = df[pd.notnull(df)] 
df = df.dropna() 

df[pd.isnull(df)] 
# gives our records containing NaN (alot of them.)

我不知道我在想什么？

编辑：的一个给人NaN的具有所有列的NaN的

一些更多的编辑：当我尝试看看类型

heads = df[df.isnull()].head() 
for idx, row in heads.iterrows(): 
    print idx, type(row.listener_id)

这回

0 <type 'float'> 
1 <type 'float'> 
2 <type 'float'> 
3 <type 'float'> 
4 <type 'float'>

来源

2017-09-05 Fraz

也许'NaN'是字符串，那么需要'df.replace（'NaN'，np.nan）' – jezrael

你可以添加数据样本吗？ 3,4行？ – jezrael

或者需要在read_csv中定义自定义的'Na'值 - [docs]（http://pandas.pydata.org/pandas-docs/stable/io.html#na-values） – jezrael

我认为如果需要使用布尔索引：

df = df[~df.isnull().any(axis=1)]

但更好的是只使用：

df = df.dropna()

样品：

df = pd.DataFrame({'A':[np.nan,5,4,5,5,np.nan], 
        'B':[7,8,9,4,2,np.nan], 
        'C':[1,3,5,7,1,np.nan], 
        'D':[5,3,6,9,2,np.nan]}) 

print (df) 
    A B C D 
0 NaN 7.0 1.0 5.0 
1 5.0 8.0 3.0 3.0 
2 4.0 9.0 5.0 6.0 
3 5.0 4.0 7.0 9.0 
4 5.0 2.0 1.0 2.0 
5 NaN NaN NaN NaN

#get True for NaN 
print (df.isnull()) 
     A  B  C  D 
0 True False False False 
1 False False False False 
2 False False False False 
3 False False False False 
4 False False False False 
5 True True True True 

#check at least one True per row 
print (df.isnull().any(axis=1)) 
0  True 
1 False 
2 False 
3 False 
4 False 
5  True 
dtype: bool 

#boolen indexing with inverting `~` (need select NO NaN rows) 
print (df[~df.isnull().any(axis=1)]) 
    A B C D 
1 5.0 8.0 3.0 3.0 
2 4.0 9.0 5.0 6.0 
3 5.0 4.0 7.0 9.0 
4 5.0 2.0 1.0 2.0

#get True for not NaN 
print (df.notnull()) 
     A  B  C  D 
0 False True True True 
1 True True True True 
2 True True True True 
3 True True True True 
4 True True True True 
5 False False False False 

#get True if all values per row are True 
print (df.notnull().all(axis=1)) 
0 False 
1  True 
2  True 
3  True 
4  True 
5 False 
dtype: bool 

#boolean indexing 
print (df[df.notnull().all(axis=1)]) 
    A B C D 
1 5.0 8.0 3.0 3.0 
2 4.0 9.0 5.0 6.0 
3 5.0 4.0 7.0 9.0 
4 5.0 2.0 1.0 2.0

#simpliest solution 
print (df.dropna()) 
    A B C D 
1 5.0 8.0 3.0 3.0 
2 4.0 9.0 5.0 6.0 
3 5.0 4.0 7.0 9.0 
4 5.0 2.0 1.0 2.0

来源

2017-09-05 07:07:47 jezrael

是的..这是它..你可以详细说明这个.. – Fraz

是的，我创建数据示例。给我一些时间 – jezrael

最后，下面两个去掉NaN 'df = df [df.isnull（）。any（axis = 1）]; '012f'df = df.dropna（）;' 'df [df.isnull（）]。head（）' 返回一个空的数据框，从而除去NaN值 – Fraz

无法从数据帧

回答

相关问题