2016-09-26 39 views
3

我是python和使用熊猫的新手。查询熊猫df过滤列不是南的行

我想查询一个数据帧并过滤其中一列不是NaN的行。

我曾尝试:

a=dictionarydf.label.isnull() 

但填充了truefalse。 试过这种

dictionarydf.query(dictionarydf.label.isnull()) 

但给了我预期的错误

样本数据:

 reference_word   all_matching_words label review 
0   account    fees - account NaN  N 
1   account   mobile - account NaN  N 
2   account   monthly - account NaN  N 
3 administration delivery - administration NaN  N 
4 administration  fund - administration NaN  N 
5   advisor    fees - advisor NaN  N 
6   advisor   optimum - advisor NaN  N 
7   advisor    sub - advisor NaN  N 
8    aichi   delivery - aichi NaN  N 
9    aichi    pref - aichi NaN  N 
10   airport    biz - airport travel  N 
11   airport    cfo - airport travel  N 
12   airport   cfomtg - airport travel  N 
13   airport   meeting - airport travel  N 
14   airport   summit - airport travel  N 
15   airport    taxi - airport travel  N 
16   airport   train - airport travel  N 
17   airport   transfer - airport travel  N 
18   airport    trip - airport travel  N 
19    ais    admin - ais NaN  N 
20    ais    alpine - ais NaN  N 
21    ais     fund - ais NaN  N 
22  allegiance  custody - allegiance NaN  N 
23  allegiance   fees - allegiance NaN  N 
24   alpha    late - alpha NaN  N 
25   alpha    meal - alpha NaN  N 
26   alpha    taxi - alpha NaN  N 
27   alpine    admin - alpine NaN  N 
28   alpine    ais - alpine NaN  N 
29   alpine    fund - alpine NaN  N 

我要过滤的数据,其中标签不是NaN的

预期输出:

 reference_word   all_matching_words label review 
0   airport    biz - airport travel  N 
1   airport    cfo - airport travel  N 
2   airport   cfomtg - airport travel  N 
3   airport   meeting - airport travel  N 
4   airport   summit - airport travel  N 
5   airport    taxi - airport travel  N 
6   airport   train - airport travel  N 
7   airport   transfer - airport travel  N 
8   airport    trip - airport travel  N 

回答

3

您可以使用dropna

df = df.dropna(subset=['label']) 

print (df) 
    reference_word all_matching_words label review 
10  airport  biz - airport travel  N 
11  airport  cfo - airport travel  N 
12  airport cfomtg - airport travel  N 
13  airport meeting - airport travel  N 
14  airport summit - airport travel  N 
15  airport  taxi - airport travel  N 
16  airport  train - airport travel  N 
17  airport transfer - airport travel  N 
18  airport  trip - airport travel  N 

另一个解决方案 - boolean indexingnotnull

df = df[df.label.notnull()] 

print (df) 
    reference_word all_matching_words label review 
10  airport  biz - airport travel  N 
11  airport  cfo - airport travel  N 
12  airport cfomtg - airport travel  N 
13  airport meeting - airport travel  N 
14  airport summit - airport travel  N 
15  airport  taxi - airport travel  N 
16  airport  train - airport travel  N 
17  airport transfer - airport travel  N 
18  airport  trip - airport travel  N 
+0

感谢您的快速回答:) @jezrael上解决了这个问题。我选择布尔索引,因为我不想删除行,我也不需要创建一个重复的数据框。这两个解决方案都很完美 – Dileep