2017-07-16 31 views
1

我有一个列表 我想找到所有常用向量。也就是说,那些包含完全相同元素的元素,在R中保留每个列表的位置编号。如果可能的话,使用一个班轮命令。找到列表中的所有匹配列表

这里是MYLIST:

mylist<-list(c("yes", "no"), c("no", "other", "up", 
"down"), c("no", "yes"), c("no", 
"yes"), c("no", "yes", "maybe"), c("no", 
"yes", "maybe"), c("no", "yes", "maybe")) 

希望的输出:

共用列表是:匹配1:1,3,4 匹配2:5,6,7

+1

我希望固定它@lmo –

+0

一个直接的方法是'ML2 = lapply(MYLIST,排序);匹配(ml2,unique(ml2))' –

+0

@alexis_laz您的解决方案不提供每个匹配匹配列表的位置!检查akrun的答案。无论如何感谢您的时间! –

回答

4

下面是使用split

Filter(function(x) length(x) >1, split(seq_along(mylist), 
        sapply(mylist, function(x) toString(sort(x))))) 
#$`maybe, no, yes` 
#[1] 5 6 7 

#$`no, yes` 
#[1] 1 3 4 
+1

它就像一个魅力!谢谢你们! –

+0

你能写一些评论,这是如何工作的?谢谢阿克伦! –

+2

@EliasEstatisticsEU这个想法是通过粘贴'mylist'的排序元素,然后'过滤'具有长度的序列'list'创建的一组'vector'来'拆分'mylist'的序列即大于1. – akrun

4

duplicated接受列表作为它的主要论据。所以你可以使用

which(duplicated(mylist1) | duplicated(mylist1, fromLast=TRUE)) 
[1] 3 4 5 6 7 

为你的第一个例子。请注意,这不会区分带有公共元素的列表元素组,但只会为具有相同元素的元素返回TRUE。

对于第二个示例数据集,您可以使用以下方法来查找组的位置

# get group values as integers 
groups <- as.integer(factor(sapply(mylist2, 
            function(x) paste(sort(x), collapse="")))) 
# return list of groups 
lapply(seq_len(max(groups)), function(x) which(x == groups)) 
[[1]] 
[1] 2 

[[2]] 
[1] 5 6 7 

[[3]] 
[1] 1 3 4 

数据

mylist1 <- 
list(c("yes", "no"), c("no", "other", "up", "down"), c("no", 
"yes", "maybe"), c("no", "yes", "maybe"), c("no", "yes", "maybe" 
), c("no", "yes", "maybe"), c("no", "yes", "maybe")) 

mylist2 <- 
list(c("yes", "no"), c("no", "other", "up", "down"), c("no", 
"yes"), c("no", "yes"), c("no", "yes", "maybe"), c("no", "yes", 
"maybe"), c("no", "yes", "maybe")) 
+0

更新了我的问题 –

+0

我想区分匹配,请参阅更新的问题。谢谢 –

+5

@EliasEstatisticsEU请避免发布移动目标问题。把时间花在编辑后突然变得无效的答案上(可能甚至是不公正的低调提示,如这里),这可能是相当令人沮丧的。请花点时间仔细考虑您在发布前的问题。干杯。 – Henrik

1

这个工作对我来说:

mylist<-list(c("yes", "no"), c("no", "other", "up", 
           "down"), c("no", "yes"), c("no", 
                  "yes"), c("no", "yes", "maybe"), c("no", 
                          "yes", "maybe"), c("no", "yes", "maybe")) 

library(dplyr) 

# function to create a dataframe from your list. Might not be the most efficient way to do this. 
f <- function(data) { 
    nCol <- max(vapply(data, length, 0)) 
    data <- lapply(data, function(row) c(row, rep(NA, nCol-length(row)))) 
    data <- matrix(unlist(data), nrow=length(data), ncol=nCol, byrow=TRUE) 
    data.frame(data) 
} 

# create a dataframe from the list, and add a 'key' column 
df = f(mylist) 
df$key = apply(df , 1 , paste , collapse = "-") 

# find the total times the key occurs 
df_total = df %>% group_by(key) %>% summarise(n =n()) 

# find the indices that belong to the groups 
result = lapply(df_total$key, function(x) which(df$key==x)) 

结果:

> result 
[[1]] 
[1] 2 

[[2]] 
[1] 5 6 7 

[[3]] 
[1] 3 4 

[[4]] 
[1] 1 

希望这会有所帮助!

+0

尽管它有效,但我不能接受它作为一个被接受的答案,因为它不是单线。感谢您的回答F Maas –

+1

为什么需要一个班轮? – Florian

+0

因为我想保持我的代码清洁! –

1

数据

mylist <- list(c("yes", "no"), c("no", "other", "up", "down"), c("no", "yes"), 
      c("no", "yes"), c("no", "yes", "maybe"), c("no", "yes", "maybe"), 
      c("no", "yes", "maybe")) 

一个(长)的单行

sapply(unique(unlist(lapply(mylist, function(x) paste(sort(x), collapse = " ")))), function(y) which(y == unlist(lapply(mylist, function(x) paste(sort(x), collapse = " "))))) 

输出一个选项:

$`no yes` 
[1] 1 3 4 

$`down no other up` 
[1] 2 

$`maybe no yes` 
[1] 5 6 7 
+0

Bravo,你做到了! (希腊怪胎?) –

+1

是埃利亚斯我是希腊人(y),我希望线索会帮助。 – lampros

2

这是一个有趣的。您可以使用mtabulateqdapTools包得到以下数据帧,

d1 <- qdapTools::mtabulate(mylist) 
d1 
# down maybe no other up yes 
#1 0  0 1  0 0 1 
#2 1  0 1  1 1 0 
#3 0  0 1  0 0 1 
#4 0  0 1  0 0 1 
#5 0  1 1  0 0 1 
#6 0  1 1  0 0 1 
#7 0  1 1  0 0 1 

然后你就可以通过粘贴把它分解,

l1 <- split(d1, do.call(paste, d1)) 

l1 
#$`0 0 1 0 0 1` 
# down maybe no other up yes 
#1 0  0 1  0 0 1 
#3 0  0 1  0 0 1 
#4 0  0 1  0 0 1 

#$`0 1 1 0 0 1` 
# down maybe no other up yes 
#5 0  1 1  0 0 1 
#6 0  1 1  0 0 1 
#7 0  1 1  0 0 1 

#$`1 0 1 1 1 0` 
# down maybe no other up yes 
#2 1  0 1  1 1 0 

但是,您可以利用该列表中你想要的,即

甚至,

setNames(lapply(l1, rownames), lapply(l1, function(i)toString(names(i)[i[1,] == 1]))) 
#$`no, yes` 
#[1] "1" "3" "4" 

#$`maybe, no, yes` 
#[1] "5" "6" "7" 

#$`down, no, other, up` 
#[1] "2" 
+0

创建“d1”后(顺便说一下,可以简单创建为'd1 = table(rep(1:length(mylist),lengths(mylist)),unlist(mylist))'),以避免强制和'粘贴',可以使用'd1%*%(2 ^(0:(ncol(d1) - 1)))'创建组,以分割 –

+0

@alexis_laz感谢您的建议。我不明白这个分组代码是如何工作的(或者实际上不会 - 抛出一个错误)......但是我的意思是它背后的逻辑 – Sotos

+0

它基本上将每行都转换为一个整数,遵循二进制 - >十进制方法。这是'apply(d1,1,function(x)sum(x *(2 ^(0:(length(x) - 1))))''的一个更好的选择。它现在被贴在SO上。 –