2014-11-03 32 views
1

希望这是一件容易的事。我似乎无法拼凑出一个答案。我有一个数据框。对于每一行,我都有我需要更改为NA的值。它不是每行都需要改变的值。我想根据指定列中的值将每行的值更改为NA。按行,将值替换为指定列中的值

mydata = as.data.frame(rbind(c("AA","CC","BB","DC","CC"),c("CC","CC","BB","DC","BB"),c("BB","BB","BB","DC","DC"))) 

    > mydata 
     V1 V2 V3 V4 V5 
    1 AA CC BB DC CC 
    2 CC CC BB DC BB 
    3 BB BB BB DC DC 

    #for each row, replace values that match the value in column 5 with NA 
    apply(mydata[,1:4], 1, function(x){ 
    x[x %in% x$V5] = NA 
    }) 

所需的输出

> mydata 
     V1 V2 V3 V4 V5 
    1 AA NA BB DC CC 
    2 CC CC NA DC BB 
    3 BB BB BB NA DC 

谢谢!

---- ---- UPDATE

从arvi1000使用下面的代码的伟大工程在连续值的一列比较值。有没有办法做这样的事情,但将值与2列或更多列进行比较?

当前代码

mydata[,1:4][mydata[,1:4]==mydata[,5]] <- NA 

比方说,我也有一列6乘行时,我想改变的是不要在第5或6 NA值相等的值。

mydata = as.data.frame(rbind(c("AA","CC","BB","DC","CC","AA"),c("CC","CC","BB","DC","BB","CC"),c("BB","BB","BB","DC","DC","BB")),stringsAsFactors=F) 

    > mydata 
     V1 V2 V3 V4 V5 V6 
    1 AA CC BB DC CC AA 
    2 CC CC BB DC BB CC 
    3 BB BB BB DC DC BB 

所需的输出

> mydata 
     V1 V2 V3 V4 V5 V6 
    1 AA CC NA NA CC AA 
    2 CC CC BB NA BB CC 
    3 BB BB BB DC DC BB 

我试图做到这一点,但收到一个错误

mydata[,1:4][mydata[,1:4]==mydata[,5]|mydata[,6]] <- NA 
    Error in mydata[, 1:4] == mydata[, 5] | mydata[, 6] : 
     operations are possible only for numeric, logical or complex types 

回答

1

添加stringsAsFactors = F到as.data.frame。这是关键,因为'CC'!='CC'当他们是不同层次的不同因素。

mydata = as.data.frame(rbind(c("AA","CC","BB","DC","CC"),c("CC","CC","BB","DC","BB"),c("BB","BB","BB","DC","DC")), 
         stringsAsFactors=F) 

然后:

mydata[,1:4][mydata[,1:4]==mydata[,5]] <- NA 

瞧:

V1 V2 V3 V4 V5 
1 AA <NA> BB DC CC 
2 CC CC <NA> DC BB 
3 BB BB BB <NA> DC 
+0

嗨,这个作品很棒!有没有一种方法可以将数据与2列或更多列中的值进行比较?我尝试使用条件(请参阅我的上述编辑),但这并没有那么好。谢谢! – SC2 2014-11-05 14:29:24

+0

你近了! 'mydata [,1:4] == mydata [,5] | mydata [,1:4] == mydata [,6]'会做 – arvi1000 2014-11-05 15:29:36

+0

太棒了!如果我想要做,似乎没有相同的工作!=而不是==。如果我想做!=我需要完全放在一起吗? – SC2 2014-11-05 15:56:55

1

另一种方法是使用应用:

mydata = as.data.frame(rbind(c("AA","CC","BB","DC","CC"),c("CC","CC","BB","DC","BB"),c("BB","BB","BB","DC","DC"))) 

mydata <- data.frame(t(apply(mydata,1,function(x) { 
    for (i in 1:(ncol(mydata)-1)){ 
    if (x[i] == x[ncol(mydata)]) { 
     x[i] <- NA 
    } 
    } 
    return(x) 
}))) 

输出:

> mydata 
    V1 V2 V3 V4 V5 
1 AA <NA> BB DC CC 
2 CC CC <NA> DC BB 
3 BB BB BB <NA> DC