2017-05-01 34 views
1

我试图通过检查多个列中的因子级别在行中是否相同来在R(3.3.2)中创建新变量。确定跨列的因子级别是否匹配R

id<-c(1:5) 
X1<-c("species1", "species1", NA, "species1", "species1") 
X2<-c(NA, "species2", NA, "species2", "species2") 
X3<-c("species1", "species2", "species2", "species3", "species3") 

它应该是这样的,检查X1是否:X3都是一样的(忽略NAS):

 id X1   X2   X3   same 
[1,] 1 "species1" NA   "species1" TRUE 
[2,] 2 "species1" "species2" "species2" FALSE 
[3,] 3 NA   NA   "species2" TRUE 
[4,] 4 "species1" "species2" "species3" FALSE 
[5,] 5 "species1" "species2" "species3" FALSE 

编辑:这是我的实际数据,而我从@中使用的代码迈克的下面回答:

s$same <- apply(s[,c(2:11)], 1, function(x) length(unique((x[!is.na(x)]))) == 1) 

dput(droplevels(head(s))) 

structure(list(rowid = structure(c(5L, 6L, 4L, 3L, 2L, 1L), .Label = c("-68975029755346725", 
"-6985608891139937154", "-7064257681237955764", "-716653329714258929", 
"-7190954401213249258", "-7190954401427629087"), class = "factor"), 
    species1 = structure(c(3L, NA, 3L, 1L, 2L, NA), .Label = c("Mycobacterium avium complex", 
    "Mycobacterium fortuitum", "Mycobacterium kansasii"), class = "factor"), 
    species2 = structure(c(NA, NA, 4L, 2L, 3L, 1L), .Label = c(" Mycobacterium fortuitum", 
    "Mycobacterium avium complex", "Mycobacterium fortuitum", 
    "Mycobacterium kansasii"), class = "factor"), species3 = structure(c(4L, 
    NA, 3L, 1L, 2L, NA), .Label = c(" Mycobacterium avium complex", 
    " Mycobacterium fortuitum", " Mycobacterium kansasii", "Mycobacterium kansasii" 
    ), class = "factor"), species4 = structure(c(NA, NA, NA, 
    NA, NA, 1L), .Label = " Mycobacterium fortuitum", class = "factor"), 
    species5 = structure(c(1L, NA, NA, NA, NA, NA), .Label = "Mycobacterium kansasii", class = "factor"), 
    species6 = structure(c(NA, NA, NA, NA, NA, 1L), .Label = " Mycobacterium fortuitum", class = "factor"), 
    species7 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    species8 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    species9 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    species10 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    same = c(TRUE, FALSE, FALSE, FALSE, FALSE, TRUE)), .Names = c("rowid", 
"species1", "species2", "species3", "species4", "species5", "species6", 
"species7", "species8", "species9", "species10", "same"), row.names = c(NA, 
6L), class = "data.frame") 

行1和6是正确的,但他们应该所有在这个群体中是真实的。

我试过applyifelseall每个组合,identicalduplicated,并unique我能想到的,但无论哪种,你不能用功能使用na.rm或者我得到一个矩阵输出,而不是一个新的变量。似乎有很多问题用数值变量来做这件事,但我无法通过因子或字符串变量找到我需要的东西。预先感谢任何帮助!

+0

当相同的变量匹配时'same'应该是'TRUE'?因为在你的例子中3是'TRUE',但是不匹配。 – hhh

+0

考虑到X2和X3匹配,不应该2也是“真”吗? –

+0

我想匹配X1:X3。我明白你的意思是3,但我只是喜欢“相同”在这种情况下是“真”。我这样做的原因是,我需要查看哪些行都具有相同的物种,哪些物种有多个物种供以后的表征。 – ericotta

回答

3

如何使用lengthunique来检查只有1个唯一值?

df <- data.frame(id = id, X1 = X1, X2 = X2, X3 = X3) 
df$same <- apply(df[,c("X1","X2","X3")], 1, function(x) 
       length(unique(trimws(x[!is.na(x)]))) == 1 | length(unique(trimws(x))) == 1) 

df 
# id  X1  X2  X3 same 
# 1 1 species1  <NA> species1 TRUE 
# 2 2 species1 species2 species2 FALSE 
# 3 3  <NA>  <NA> species2 TRUE 
# 4 4 species1 species2 species3 FALSE 
# 5 5 species1 species2 species3 FALSE 

添加在trimws()摆脱前/后的空白和条件,所有都是NA

+0

由于某种原因,当我运行它时,NAs似乎仍然在计数,我对这些行得到“FALSE”...并且我已经仔细检查以确保它们实际上是NA并且不是空白或其他东西。 – ericotta

+0

我其实有10个变量(这是一个非常大的数据集),但我很确定我已经正确地调用了它们到函数中。 – ericotta

+0

可以,谢谢! – ericotta