2015-12-07 64 views
1

我有一个数据帧df。我通过v1和v2订购的数据框。 对于v1中每组唯一值(样本数据中的值1,2和3),我想计算一个新变量v5。R:如何根据序列和其他列中的值获取数据帧列中的值

v5的值取决于v3和v4的值: 如果v3 ==“新建”,则v5 == v4。 如果v3 ==“Old”,则v5会在v3中第一个前面的值等于“New”的行中获取v4的值。全部在v1的相同“组”中。

的样本数据:

df <- data.frame(v1=c(1,1,1,2,2,2,3,3,3,3), 
      v2=c(1,2,3,1,2,3,1,2,3,4), 
      v3=c("New", "Old", "Old","New", "Old", "New","New", "New", "Old","Old"), 
      v4=c("A","B","C","X","Y","Z","A","B","C","D")) 


v1 v2 v3 v4 
1 1 New A 
1 2 Old B 
1 3 Old C 
2 1 New X 
2 2 Old Y 
2 3 New Z 
3 1 New A 
3 2 New B 
3 3 Old C 
3 4 Old D 

所需的输出:

v1 v2 v3 v4 v5 
    1 1 New A A 
    1 2 Old B A 
    1 3 Old C A 
    2 1 New X X 
    2 2 Old Y X 
    2 3 New Z Z 
    3 1 New A A 
    3 2 New B B 
    3 3 Old C B 
    3 4 Old D B 

回答

2

也可以用dplyr包。

library(dplyr) 
library(zoo) 
df <- data.frame(v1=c(1,1,1,2,2,2,3,3,3,3), 
       v2=c(1,2,3,1,2,3,1,2,3,4), 
       v3=c("New", "Old", "Old","New", "Old", "New","New", "New", "Old","Old"), 
       v4=c("A","B","C","X","Y","Z","A","B","C","D"), 
       stringsAsFactors = FALSE) 
df %>% 
    group_by(v1) %>% 
    mutate(v5=ifelse(v3=="New", v4, NA), 
     v5=na.locf(v5)) 
# Source: local data frame [10 x 5] 
# Groups: v1 [3] 
# 
#  v1 v2 v3 v4 v5 
# (dbl) (dbl) (chr) (chr) (chr) 
# 1  1  1 New  A  A 
# 2  1  2 Old  B  A 
# 3  1  3 Old  C  A 
# 4  2  1 New  X  X 
# 5  2  2 Old  Y  X 
# 6  2  3 New  Z  Z 
# 7  3  1 New  A  A 
# 8  3  2 New  B  B 
# 9  3  3 Old  C  B 
# 10  3  4 Old  D  B 
+0

太好了。 Thx @ docendo discimus –

1

我们可以data.table尝试。将'data.frame'转换为'data.table'(setDT(df)),按'v1'分组,我们用replace'v4'元素对应'v3' )将NA值替换为前面的非NA值,分配(:=)输出以创建新列'v5'。

library(data.table) 
library(zoo) 
setDT(df)[, v5:= na.locf(replace(v4, v3=='Old', NA)) , by = v1] 
df 
# v1 v2 v3 v4 v5 
# 1: 1 1 New A A 
# 2: 1 2 Old B A 
# 3: 1 3 Old C A 
# 4: 2 1 New X X 
# 5: 2 2 Old Y X 
# 6: 2 3 New Z Z 
# 7: 3 1 New A A 
# 8: 3 2 New B B 
# 9: 3 3 Old C B 
#10: 3 4 Old D B 

或者我们可以使用avebase R

df$v5 <- with(df, ave(replace(v4, v3=='Old', NA),v1, FUN= na.locf)) 
相关问题