2016-07-22 50 views
1

我有n意见,我已经计算了m簇。我生成的集群实际上是分层分裂的,即使它们是独立计算的。这里是我的数据的一个子集:创建一个分层簇对象

print(test) 

    m_0 m_13000 m_14608 m_16278 
    <dbl> <dbl> <dbl> <dbl> 
1  1  10 101 1001 
2  1  10 101 1002 
3  1  11 102 1003 
4  1  11 102 1004 
5  1  12 103 1005 
6  1  12 104 1006 
7  2  13 105 1007 
8  2  13 106 1008 
9  2  13 106 1009 
10  2  14 107 1010 
.. ...  ...  ...  ... 

每一行i = 1:n是一个观察,每列j = 1:m是基于聚类j意见的成员。群集ID在不同的群集解决方案中是唯一的,即min(test[, j]) > max(test[, j-1])

观察值表示为igraph图上的顶点。 我想将上面的test数据转换为合并矩阵,以传递给igraph::make_clusters以进一步处理。做这个的最好方式是什么?我查看了由this example创建的合并矩阵,但我并不真正了解它。谁能帮我吗?

回答

0

我的解决办法结束了被使用的the answer to a related SO question about dendrograms一个修改的版本的数据帧到Newick树字符串转换,然后读取所产生的字符串转换成使用phytools::read.newick一个phylo对象,在这一点,我可以使用ape::as.hclust转换为hclust对象(如果需要的话)。不错!

(略编辑)与其他解决方案,以便回答

注:这些功能似乎并没有发挥好与tibbles,所以使用标准data.frames代替

df2newick <- function(df, innerlabel = FALSE){ 
    traverse <- function(a, i, innerl){ 
    if(i < (ncol(df))){ 
     alevelinner <- as.character(
      unique(df[which(as.character(df[,i]) == a), i + 1]) 
     ) 
     desc <- NULL 
     for(b in alevelinner) 
      desc <- c(desc, traverse(b, i + 1, innerl)) 
     il <- NULL 
     if(innerl==TRUE) 
      il <- paste0(",", a) 
     (newickout <- paste("(", paste(desc,collapse = ","), ")", il, 
      sep="")) 
    } 
    else { 
     (newickout <- a) 
    } 
    } 

    alevel <- as.character(unique(df[,1])) 
    newick <- NULL 
    for(x in alevel) 
    newick <- c(newick, traverse(x, 1, innerlabel)) 
    (newick <- paste("(", paste(newick, collapse = ","), ");", sep="")) 
} 

重现的实例

ex = structure(list(level.1 = c("1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1"), level.2 = c("883", "883", "883", 
"883", "883", "883", "883", "883", "1758", "883", "883", "883", 
"883"), level.3 = c("2293", "2293", "2293", "2293", "2293", "2293", 
"2293", "2293", "3240", "2293", "2293", "2293", "2293"), level.4 = c("3932", 
"3932", "3932", "3932", "3932", "3932", "3932", "3932", "5139", 
"5777", "3932", "3932", "3932"), level.5 = c("6056", "6056", 
"6056", "6056", "6056", "6056", "6056", "6056", "7472", "8110", 
"6056", "6056", "6056"), level.6 = c("8456", "8545", "8949", 
"8456", "8545", "8456", "8545", "8545", "10385", "11023", "8545", 
"8545", "8545"), level.7 = c("11525", "11635", "12084", "12297", 
"12339", "12297", "12339", "12339", "13632", "14270", "12339", 
"12339", "12339"), name = c("A", "B", "C", "D", "E", "F", "G", 
"H", "I", "J", "K", "L", "M")), class = "data.frame", .Names = c("level.1", 
"level.2", "level.3", "level.4", "level.5", "level.6", "level.7", 
"name"), row.names = c(NA, -13L)) 

treestring = df2newick(ex, innerlabel = FALSE) 

library(phytools) 
extree = collapse.singles(read.newick(text = treestring)) 
extree$node.label = head(names(ex), -1) 
plot(extree, show.node.label = TRUE) 
1

的替代(并且很容易)的解决方案是使用data.tree包。

library(data.tree)  
tree = as.Node(ex) 
library(ape) 
ph = as.phylo(tree) 
as.hclust(ph) 

但是,请注意,您需要一些方法,以转化成hclust对象定义分支长度。这个相同的约束适用于我的其他答案。