我试图通过检查多个列中的因子级别在行中是否相同来在R(3.3.2)中创建新变量。确定跨列的因子级别是否匹配R
id<-c(1:5)
X1<-c("species1", "species1", NA, "species1", "species1")
X2<-c(NA, "species2", NA, "species2", "species2")
X3<-c("species1", "species2", "species2", "species3", "species3")
它应该是这样的,检查X1是否:X3都是一样的(忽略NAS):
id X1 X2 X3 same
[1,] 1 "species1" NA "species1" TRUE
[2,] 2 "species1" "species2" "species2" FALSE
[3,] 3 NA NA "species2" TRUE
[4,] 4 "species1" "species2" "species3" FALSE
[5,] 5 "species1" "species2" "species3" FALSE
编辑:这是我的实际数据,而我从@中使用的代码迈克的下面回答:
s$same <- apply(s[,c(2:11)], 1, function(x) length(unique((x[!is.na(x)]))) == 1)
dput(droplevels(head(s)))
structure(list(rowid = structure(c(5L, 6L, 4L, 3L, 2L, 1L), .Label = c("-68975029755346725",
"-6985608891139937154", "-7064257681237955764", "-716653329714258929",
"-7190954401213249258", "-7190954401427629087"), class = "factor"),
species1 = structure(c(3L, NA, 3L, 1L, 2L, NA), .Label = c("Mycobacterium avium complex",
"Mycobacterium fortuitum", "Mycobacterium kansasii"), class = "factor"),
species2 = structure(c(NA, NA, 4L, 2L, 3L, 1L), .Label = c(" Mycobacterium fortuitum",
"Mycobacterium avium complex", "Mycobacterium fortuitum",
"Mycobacterium kansasii"), class = "factor"), species3 = structure(c(4L,
NA, 3L, 1L, 2L, NA), .Label = c(" Mycobacterium avium complex",
" Mycobacterium fortuitum", " Mycobacterium kansasii", "Mycobacterium kansasii"
), class = "factor"), species4 = structure(c(NA, NA, NA,
NA, NA, 1L), .Label = " Mycobacterium fortuitum", class = "factor"),
species5 = structure(c(1L, NA, NA, NA, NA, NA), .Label = "Mycobacterium kansasii", class = "factor"),
species6 = structure(c(NA, NA, NA, NA, NA, 1L), .Label = " Mycobacterium fortuitum", class = "factor"),
species7 = structure(c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"),
species8 = structure(c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"),
species9 = structure(c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"),
species10 = structure(c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"),
same = c(TRUE, FALSE, FALSE, FALSE, FALSE, TRUE)), .Names = c("rowid",
"species1", "species2", "species3", "species4", "species5", "species6",
"species7", "species8", "species9", "species10", "same"), row.names = c(NA,
6L), class = "data.frame")
行1和6是正确的,但他们应该所有在这个群体中是真实的。
我试过apply
和ifelse
与all
每个组合,identical
,duplicated
,并unique
我能想到的,但无论哪种,你不能用功能使用na.rm
或者我得到一个矩阵输出,而不是一个新的变量。似乎有很多问题用数值变量来做这件事,但我无法通过因子或字符串变量找到我需要的东西。预先感谢任何帮助!
当相同的变量匹配时'same'应该是'TRUE'?因为在你的例子中3是'TRUE',但是不匹配。 – hhh
考虑到X2和X3匹配,不应该2也是“真”吗? –
我想匹配X1:X3。我明白你的意思是3,但我只是喜欢“相同”在这种情况下是“真”。我这样做的原因是,我需要查看哪些行都具有相同的物种,哪些物种有多个物种供以后的表征。 – ericotta