转换for循环到-apply函数，其中输入是数据帧不矢量

我有3列，大约是这样的数据：转换for循环到-apply函数，其中输入是数据帧不矢量

uid <- c(1,1,1,1,1,1,2,2,2) 
sale <- c(0,1,1,0,0,0,0,1,0) 
e <- as.data.frame(cbind(uid, sale)) 
e$uid <- as.factor(e$uid) 
e$sincesale <- NA

对于每一个唯一的ID，我想申请相同的程序 - 计算自上次销售以来的天数。

我可以很容易地想出可以做到这一点的for-loop。问题是我有数百万行。所以，完成这个过程需要很长时间。我想在e$uid上使用tapply。但是，tapply只接受向量作为输入。

可以使用什么方法（比循环更快）？

我的for循环：

for (i in 2:length(e$uid)){ 
    #working within the good with the same unique id (uid) 
    if (e$uid[i] == e$uid[i-1]){ 
    if (e$sale[i]==1){ 
     sincesale[i] <- sincesale[i-1]+1 
    } 
    if (e$sale[i]==0){ 
     #if sale just ended, number of days since sale is 1 
     if (e$sale[i-1]==1){ 
     e$sincesale[i] <- 1 
     } 
     #if sale ended a few periods ago add 1 to previous value of "sincesale" 
     if (e$sale[i-1] == 0){ 
     e$sincesale[i] <- e$sincesale[i-1] + 1 
     } 
    } 
    } 
}

UPD：

好吧，老实说，我尝试了我自己过去在早上和晚上的工作，但不能拿出解决的新问题。我尝试使用建议的方法，但是一个小问题是他们从第一行开始计算“sincesale”（因为即使销售不从头开始，销售== 0对于第一行也是如此）。下面的例子中输入生成具有for循环（“sincesale”）的结果，并使用建议dplyr（“sincesale4”）：

uid <- c(1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4) 
sale <- c(0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,1,0,0,0,0) 
e <- as.data.frame(cbind(uid, sale)) 
e$uid <- as.factor(e$uid) 

    uid sale first sincesale sincesale4 
1 1 0  1  NA   0 
2 1 0  1  NA   1 
3 1 1  0  NA   1 
4 1 0  0   1   2 
5 1 0  0   2   3 
6 1 0  0   3   4 
7 2 0  1  NA   0 
8 2 1  1  NA   0 
9 2 0  0   1   1 
10 2 1  0  NA   1 
11 3 0  1  NA   0 
12 3 0  1  NA   1 
13 3 0  0  NA   2 
14 3 0  0  NA   3 
15 3 0  0  NA   4 
16 3 0  0  NA   5 
17 3 1  0  NA   5 
18 3 1  0  NA   5 
19 3 0  0   1   6 
20 4 0  1  NA   0 
21 4 0  1  NA   1 
22 4 0  0  NA   2

来源

2017-06-12 user3349993

只是'e < - data.frame（uid，sale）; e $ uid < - as.factor（e $ uid）; e $ sincesale < - NA'应该对其进行分类，我相信。 – thelatemail

使用ave看每个uid组内，并获得的所述累加值cumsum非销售天：

e$sincesale2 <- ave(!e$sale, e$uid, FUN=cumsum)-1 

# uid sale sincesale sincesale2 
#1 1 0  NA   0 
#2 1 1  NA   0 
#3 1 1  NA   0 
#4 1 0   1   1 
#5 1 0   2   2 
#6 1 0   3   3 
#7 2 0  NA   0 
#8 2 1  NA   0 
#9 2 0   1   1

翻译成data.table这将是：

library(data.table) 
setDT(e) 
e[, sincesale3 := cumsum(!sale)-1, by=uid]

或者dplyr与@RonakShah的帽子提示：

library(dplyr) 
e %>% 
    group_by(uid) %>% 
    mutate(sincesale4 = cumsum(!sale)-1)

来源

2017-06-12 05:22:35 thelatemail

转换for循环到-apply函数，其中输入是数据帧不矢量

回答

相关问题