2014-10-30 50 views
39

我试图使用dplyr管道从子集中删除NA。我的答案是错过了一个步骤。我想学习如何使用dplyr写功能:删除dplyr管道中的NA

> outcome.df%>% 
+ group_by(Hospital,State)%>% 
+ arrange(desc(HeartAttackDeath,na.rm=TRUE))%>% 
+ head() 
Source: local data frame [6 x 5] 
Groups: Hospital, State 
 
          Hospital State HeartAttackDeath 
1  ABBEVILLE AREA MEDICAL CENTER SC    NA 
2  ABBEVILLE GENERAL HOSPITAL LA    NA 
3  ABBOTT NORTHWESTERN HOSPITAL MN    12.3 
4 ABILENE REGIONAL MEDICAL CENTER TX    17.2 
5  ABINGTON MEMORIAL HOSPITAL PA    14.3 
6 ABRAHAM LINCOLN MEMORIAL HOSPITAL IL    NA 
Variables not shown: HeartFailureDeath (dbl), PneumoniaDeath 
    (dbl) 
+0

我认为你有错误的库。数据在哪里? – 2014-10-30 23:47:31

+1

为什么不'na.omit'? – isomorphismes 2015-09-01 16:57:31

+1

还有http://stackoverflow.com/questions/22353633/filter-for-complete-cases-in-data-frame-using-dplyr-case-wise-deletion/37031161#37031161它回答了同样的问题。 – 2016-05-04 17:00:59

回答

75

我不认为desc需要一个na.rm说法......实际上,我惊奇地发现这不会引发错误,当您给它一个。如果你只是想删除NA S,使用na.omit

outcome.df %>% 
    na.omit() %>% 
    group_by(Hospital, State) %>% 
    arrange(desc(HeartAttackDeath)) %>% 
    head() 

如果只想从HeartAttackDeath列中删除NA S,滤波器is.na

outcome.df %>% 
    filter(!is.na(HeartAttackDeath)) %>% 
    group_by(Hospital, State) %>% 
    arrange(desc(HeartAttackDeath)) %>% 
    head() 

正如指出的欺骗, complete.cases也可以使用,但是放入链中有点麻烦,因为它将数据帧作为参数,但返回索引向量。所以你可以这样使用它:

outcome.df %>% 
    filter(complete.cases(.)) %>% 
    group_by(Hospital, State) %>% 
    arrange(desc(HeartAttackDeath)) %>% 
    head() 
+0

非常感谢。我用na.omit为所有列,它的工作。 outcome.df是大型数据集的一个子集。我试图按照从最好到最差的顺序排列条件。 – ITCoderWhiz 2014-11-01 12:23:48

+0

当我以这种方式使用na.omit时,它会抛出错误na.omit.default()参数“object”缺少,没有默认值,即使我喂它hflights。在管道的第二阶段与!is.na(hflights)一样的行为... @ ITCoderWhiz – d8aninja 2015-02-28 01:40:59

+0

@ D8Amonk听起来像你有一些功能屏蔽正在进行。从新的R会话库(dplyr);库(hflights); x = hflights%>%na.omit()'工作得很好。也许你已经加载了一个包含它自己的'na.omit'函数的包? – Gregor 2015-02-28 02:07:56