2014-10-27 15 views
2

我真的尝试了很多来解决以下问题,并且我已阅读了很多内容。不过,我仍然不能管理它Dataframe:列,检查每行的previos行的值并输入值

见这个例子:

time <- sample(1:300, 20) 
test <- c (0,0,0, NA, 0, 0, 3, 0, 0, NA, 0,0, 3, 0, 0, NA, 0, 0, 3, 0) 
take <- rep(NA, 20) 
df <-data.frame(time, test, take) 
> head(df, 8) 
    time test take 
1 271 0 NA 
2 147 0 NA 
3 277 0 NA 
4 247 NA NA 
5 82 0 NA 
6 133 0 NA 
7 231 3 NA 
8 110 0 NA 

现在我想在最后一个(带)列中输入值。该值取决于第二列(测试)中的条件。如果它是NA或3,它可以保持空白。好吧,到目前为止, 但我的问题是值0.它应该把一个“a”的行,如果前一行的值为0,但一个“b”,如果它是3,“c”的休息。

因此输出应该是这样的:

head(df, 8) 
     time test take 
    1 271 0 c 
    2 147 0 a 
    3 277 0 a 
    4 247 NA NA 
    5 82 0 c 
    6 133 0 a 
    7 231 3 NA 
    8 110 0 b 

感谢您的帮助!

回答

1

尝试:

is0<-which(df$test==0) # indices of test elements = 0 
df[is0,"take"]<-"c" # for each test=0, put take="c", as it is the "default" value 
for (i in setdiff(is0,1)){ # for each test=0 that is not the first one (because the first row doesn't have a previous row) 
    if((i-1) %in% is0) df$take[i]<-"a" else if(df$test[i-1]==3 & !is.na(df$test[i-1])) df$take[i]<-"b" # if in the previous row test=0 then take="a", if it is 3 (and not NA), take="b" 
} 
+1

请你的答案 – Huangism 2014-10-27 15:46:30

+0

@Huangism提供一些解释,我的评论我的回答,对不起,我没有在第一! – Cath 2014-10-28 07:20:14

1

你也可以做

indx <- c(FALSE,!df$test[-nrow(df)] & !is.na(df$test)[-nrow(df)]) 
indx1 <- c(FALSE,df$test[-nrow(df)]==3 & !is.na(df$test)[-nrow(df)]) 
indx2 <- df$test==3|is.na(df$test) 

df$take <- c('c','a','b', NA)[as.numeric(factor(1+2*indx+4*indx1+8*indx2))] 

df$take 
#[1] "c" "a" "a" NA "c" "a" NA "b" "a" NA "c" "a" NA "b" "a" NA "c" "a" NA 
#[20] "b" 
0

使用包dplyr,可以拆分您的问题分成两个部分。第1部分:编写一个封装逻辑的函数,根据前面的行填充take

return_value_based_on_previous_row <- function(x, lagged) { 

    if (is.na(x) | x == 3) { 
     temp = NA 
    } else { 

     if (is.na(lagged)) { 
      temp = "c" 
     } else if (lagged == 0) { 
     temp = "a" 
     } else if (lagged == 3) { 
     temp = "b" 
     } 

    } 

    return(as.character(temp)) 

} 

第2部分:使用lagmutate通过行上df行工作。

df <- 
    df %>% 
    mutate(lag_test = lag(test)) %>% # make temp column which contains previous value of test 
    rowwise() %>% # makes the following mutate work on each row separately 
    mutate(take = return_value_based_on_previous_row(test, lag_test)) %>% 
    select(-lag_test) #remove temp column 

这给:

> df 
    time test take 
1 164 0 c 
2 36 0 a 
3 279 0 a 
4 255 NA NA 
5 241 0 c 
6 188 0 a 
7 117 3 NA 
8 75 0 b 
9 60 0 a 
10 175 NA NA 
11 238 0 c 
12 184 0 a 
13 272 3 NA 
14 215 0 b 
15 49 0 a 
16 204 NA NA 
17 291 0 c 
18 218 0 a 
19 197 3 NA 
20 138 0 b