2017-06-23 44 views
2

我有一个数据帧值,并以groupIDdate分类:计数和条件,组

d1 <- data.frame(groupID = c(1,1,1,1,1,3,3,3,3), 
       date = c(1,2,3,4,5,6,7,8,9), 
       value = c(1,1,25,1,1,25,1,25,1)) 

> d1 
groupID date value 
     1 1  1 
     1 2  1 
     1 3 25 
     1 4  1 
     1 5  1 
     3 6 25 
     3 7  1 
     3 8 25 
     3 9  1 

我要创建两个新的栏目:

  1. 对于25每次出现,每个组的前值计数= 1
  2. 对于每次出现的25,在值= 25之后值= 1之前的值在每个组的下一个值= 25之前为

所需的输出:

groupID date value Prev1s After1s 
     1 1  1 
     1 2  1 
     1 3 25  2  2 
     1 4  1 
     1 5  1 
     3 6 25  0  1 
     3 7  1 
     3 8 25  1  1 
     3 9  1 

我能够通过创建一个计数器,并采取前值做使用Excel一样。我曾尝试在R中使用sum,shift()来达到相同效果,但徒劳无益。

+0

看看'rle'功能。 –

+0

顺便说一下,第二个'Prev1s'应该是2,而不是0。 –

+0

不是,应该是0.(按groupID分组) –

回答

1

您可以使用data.table -package结合的rle -function与dplyr做到这一点...

library(dplyr) 
#first set up some grouping variables based on runs before and after 25s 
d1 <- d1 %>% mutate(PrevGp=cumsum(lag(value==25,default = 1)), 
        AfterGp=cumsum(value==25)) %>% 
#use these to calculate the values you want for each group 
    group_by(groupID,PrevGp) %>% mutate(Prev1s=sum(value)-25) %>% 
    group_by(groupID,AfterGp) %>% mutate(After1s=sum(value)-25) %>% 
    ungroup() %>% 
#remove values (set to "") other than for value==25 
    mutate(Prev1s=replace(Prev1s,value!=25,""), 
     After1s=replace(After1s,value!=25,"")) %>% 
#and remove the grouping variables 
    select(-c(PrevGp,AfterGp)) 

d1 
# A tibble: 9 x 5 
    groupID date value Prev1s After1s 
    <dbl> <dbl> <dbl> <chr> <chr> 
1  1  1  1    
2  1  2  1    
3  1  3 25  2  2 
4  1  4  1    
5  1  5  1    
6  3  6 25  0  1 
7  3  7  1    
8  3  8 25  1  1 
9  3  9  1    
+1

谢谢!我一直在尝试这样做超过一个星期! –

0

一种替代方案:

library(data.table) 
setDT(d1)[, c('prev1s','after1s') := {p <- a <- rle(value); 
             i <- p$values == 25; 
             p$values[i] <- shift(p$lengths, fill = 0)[i]; 
             a$values[i] <- shift(a$lengths, type = 'lead', fill = 0)[i]; 
             p$values[!i] <- a$values[!i] <- NA; 
             list(inverse.rle(p),inverse.rle(a))}, 
      by = groupID][] 

这给:

groupID date value prev1s after1s 
1:  1 1  1  NA  NA 
2:  1 2  1  NA  NA 
3:  1 3 25  2  2 
4:  1 4  1  NA  NA 
5:  1 5  1  NA  NA 
6:  3 6 25  0  1 
7:  3 7  1  NA  NA 
8:  3 8 25  1  1 
9:  3 9  1  NA  NA