2016-08-19 98 views
0

假设我的原始数据看起来像这样将2个变量合并为1?

df <- data.frame(id = 1:10, 
       V = LETTERS[1:10], 
       Treatment1 = c(rep(1,3), rep(0,7)), 
       Treatment2 = c(rep(0,3), rep(1,3), rep(0,4))) 

我想合并Treatment1Treatment2到一个新的变量,需要1 3的值:Treatment1Treatment2Control。这就是我想和这个数据帧结束:

df2 <- data.frame(id = 1:10, 
        V = LETTERS[1:10], 
        Treatment = c(rep("Treatment1",3), 
           rep("Treatment2",3), 
           rep("Control",4))) 

现在我正在使用此代码做:

library(dplyr) 
df$Treatment <- ifelse(test = df$Treatment1==1, yes = "Treatment1", 
         no = ifelse(test = df$Treatment2==1, 
            yes = "Treatment2", no = "Control")) 

df2 <- df %>% select(-Treatment1, -Treatment2) 

有没有更好的办法?

+2

据我所见,这个问题与tidyr和dplyr完全无关。 –

回答

3

一种方式,最终被相当可读和可扩展是创建一个查找表,并与您现有的数据进行合并如下:

df2 <- data.frame(Treatment1 = c(1,0,0), 
        Treatment2 = c(0,1,0), 
        Treatment = c("Control", "Treatment1", "Treatment2")); 
merge(df, df2, all.x=TRUE) #Setting all.x ensures rows of df aren't dropped if there isn't a match 

#  Treatment1 Treatment2 id V Treatment 
# 1   0   0 7 G Treatment2 
# 2   0   0 8 H Treatment2 
# 3   0   0 9 I Treatment2 
# 4   0   0 10 J Treatment2 
# 5   0   1 4 D Treatment1 
# 6   0   1 5 E Treatment1 
# 7   0   1 6 F Treatment1 
# 8   1   0 1 A Control 
# 9   1   0 2 B Control 
# 10   1   0 3 C Control 
2

我们可以做到这一点没有任何ifelse

df$Treatment <- with(df, c("Control", "Treatment1", "Treatment2")[(Treatment1 + 
           2*Treatment2)+1]) 
df$Treatment 
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2" 
#[6] "Treatment2" "Control" "Control" "Control" "Control" 

或者另一种选择是pmax

c("Control", "Treatment1", "Treatment2")[do.call(pmax, df[3:4]*col(df[3:4]))+1] 
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2" 
#[6] "Treatment2" "Control" "Control" "Control" "Control" 

如果需要与 'DF2',paste相比拟'df'中的第3和第4列一起,设置的名称'df2'中的'Treatment'的3210个元素与'v1'中的独特元素(在示例中它是以相同的顺序)使用它来替换值。

v1 <- do.call(paste0, df[3:4]) 
unname(setNames(as.character(unique(df2$Treatment)), c("10", "01", "00"))[v1]) 
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2" 
#[6] "Treatment2" "Control" "Control" "Control" "Control" 

注:所有这些方法没有使用包,应该是有效的做到这一点

2

dplyr::case_when是一个很好的替代嵌套ifelse S:

library(dplyr) 

df %>% mutate(Treatment = case_when(.$Treatment1 == 1 ~ 'Treatment1', 
            .$Treatment2 == 1 ~ 'Treatment2', 
            TRUE ~ 'Control')) %>% 
    select(-Treatment1, -Treatment2) 
    ## id V Treatment 
    ## 1 1 A Treatment1 
    ## 2 2 B Treatment1 
    ## 3 3 C Treatment1 
    ## 4 4 D Treatment2 
    ## 5 5 E Treatment2 
    ## 6 6 F Treatment2 
    ## 7 7 G Control 
    ## 8 8 H Control 
    ## 9 9 I Control 
    ## 10 10 J Control 

由于它还是新的并且有点实验性,因此case_when需要$表示法mutatefor now,但是it looks like that will change时间太长。