2016-05-22 44 views
0

我有一个数据帧列出总的学生(STU)与学生每组(ID)谁正在参与的活动(子)的数目:COUNTIF等效在dplyr总结

 ID Stu Sub 
    (int) (int) (int) 
1 101 80 NA 
2 102 130 NA 
3 103 10 NA 
4 104 210 20 
5 105 180 NA 
6 106 150 NA 

我想知道组的大小带(> 400,> 200,> 100,> 0)的数量谁不是参与一种活动(子> 0),或不(子is.na)

output <- structure(list(ID = c(101L, 102L, 103L, 104L, 105L, 106L), 
         Stu = c(80L, 130L, 10L, 210L, 180L, 150L), 
         Sub = c(NA,NA, NA, 20L, NA, NA)), 
        .Names = c("ID", "Stu", "Sub"), 
        class = c("tbl_df", "data.frame"), 
        row.names = c(NA, -6L)) 

temp <- output %>% 
mutate(Stu = ifelse(Stu >= 400, 400, 
     ifelse(Stu >= 200, 200, 
      ifelse(Stu >= 100, 100, 0 
       )))) %>% 
group_by(Stu) %>% 
summarise(entries = length(!is.na(Sub)), 
      noentries = length(is.na(Sub))) 

的结果应该是:

Stu entries noentries 
    (dbl) (int)  (int) 
1  0  0   2 
2 100  0   3 
3 200  1   0 

,但我得到:

Stu entries noentries 
    (dbl) (int)  (int) 
1  0  2   2 
2 100  3   3 
3 200  1   1 

我怎样才能使长度功能在总结起来就像COUNTIF?

+0

某事错在你最后ifel se –

+0

对不起,错过了0,现在应该工作 – pluke

+0

'sum'是正确的解决方案,如下所述。为了清楚起见,长度返回它提供的向量的长度。在这种情况下,无论真/假值如何,长度函数都会返回每个组中的项目数。 – Gopala

回答

1

summarise要求单一值,所以sum代替length这项工作:

output %>% 
    mutate(Stu = ifelse(Stu >= 400, 400, 
         ifelse(Stu >= 200, 200, 
          ifelse(Stu >= 100, 100, 0 
          )))) %>% 
    group_by(Stu) %>% 
    summarise(entries = sum(!is.na(Sub)), 
      noentries = sum(is.na(Sub))) 

Source: local data frame [3 x 3] 

Stu entries noentries 
(dbl) (int)  (int) 
1  0  0   2 
2 100  0   3 
3 200  1   0 
+0

啊是的,我忘了is.na返回一个布尔向量,可以加总 – pluke

1

另一种选择是组由两个StuSub,但要做到这一点,我们首先需要重新编写的Sub值和Stu以匹配我们想要的输出分组。我们还使用cut,而不是嵌套ifelse,设定值断裂处Stu

library(reshape2) 

output %>% 
    group_by(Sub=ifelse(is.na(Sub), "No Entries", "Entries"), 
      Stu=cut(Stu, c(0,100,200,400,Inf), labels=c(0,100,200,400))) %>% 
    tally %>% 
    dcast(Stu ~ Sub, fill=0) 
 Stu Entries No Entries 
1  0  0   2 
2 100  0   3 
3 200  1   0 
3

继@ eipi10提供同样的想法,但切正题与count()代替group_by() %>% tally()并表示tidyr::spread可以模仿reshape2::dcast

output %>% 
    count(Sub = ifelse(is.na(Sub), 'No Entries', 'Entires'), 
     Stu = cut(Stu, c(0, 100, 200, 400, +Inf), labels = c(0, 100, 200, 400))) %>% 
    tidyr::spread(Sub, n, fill = 0)