背景：

我处理的一个组合问题在R.对于套我需要生成每一套都对不产生重复一个给定的名单。有效地与工作组中的R

实施例：

initial_list_of_sets <- list() 
initial_list_of_sets[[1]] <- c(1,2,3) 
initial_list_of_sets[[2]] <- c(2,3,4) 
initial_list_of_sets[[3]] <- c(3,2) 
initial_list_of_sets[[4]] <- c(5,6,7) 
get_pairs(initial_list_of_sets) 
# should return (1 2),(1 3),(2 3),(2 4),(3 4),(5 6),(5 7),(6 7)

请注意，（3 2）不包括在结果中，因为它在数学上等于（2 3）。到目前为止

我的（工作，但效率不高）的方法：

# checks if sets contain a_set 
contains <- function(sets, a_set){ 
    for (existing in sets) { 
    if (setequal(existing, a_set)) { 
     return(TRUE) 
    } 
    } 
    return(FALSE) 
} 

get_pairs <- function(from_sets){ 
    all_pairs <- list() 
    for (a_set in from_sets) { 
    # generate all pairs for current set 
    pairs <- combn(x = a_set, m = 2, simplify = FALSE) 
    for (pair in pairs) { 
     # only add new pairs if they are not yet included in all_pairs 
     if (!contains(all_pairs, pair)) { 
     all_pairs <- c(all_pairs, list(pair)) 
     } 
    } 
    } 
    return(all_pairs) 
}

我的问题：

正如我处理的数学套我不能使用%in%运营商，而不是我contains功能，因为那么（2 3）和（3 2）将是不同的对。但是，对contains中的所有现有集进行迭代似乎效率很低。有没有更好的方法来实现这个功能？

来源

2015-11-27 fab

是的！我会接受你的回答。我想了解R如何在幕后快速实现...... – fab

在循环中，每当有新值添加时，就会增加列表，这通常不是非常有效。我也尝试在R中使用一些已经优化的函数（例如'lapply'，'unique'）。 – A5C1D2H2I1M1N2O1R2T1

也许你可以重写你的get_pairs功能类似如下：

myFun <- function(inlist) { 
    unique(do.call(rbind, lapply(inlist, function(x) t(combn(sort(x), 2))))) 
}

这里有一个快速的时间比较。

n <- 100 
set.seed(1) 

x <- sample(2:8, n, TRUE) 
initial_list_of_sets <- lapply(x, function(y) sample(100, y)) 

system.time(get_pairs(initial_list_of_sets)) 
# user system elapsed 
# 1.964 0.000 1.959 
system.time(myFun(initial_list_of_sets)) 
# user system elapsed 
# 0.012 0.000 0.014

如果需要，可以按split矩阵按行获取您的列表。

如：

myFun <- function(inlist) { 
    temp <- unique(do.call(rbind, lapply(inlist, function(x) t(combn(sort(x), 2))))) 
    lapply(1:nrow(temp), function(x) temp[x, ]) 
}

来源

2015-11-27 14:57:47 A5C1D2H2I1M1N2O1R2T1

有效地与工作组中的R

背景：

我的（工作，但效率不高）的方法：

我的问题：

回答

相关问题