加速R中的相关矩阵计算

我有一个有49个变量和4M行的数据帧。我想计算49 x 49的相关矩阵。所有列都是类数字。加速R中的相关矩阵计算

这里有一个例子：

df <- data.frame(replicate(49,sample(0:50,4000000,rep=TRUE)))

我使用的标准cor功能。

cor_matrix <- cor(df, use = "pairwise.complete.obs")

这需要很长时间。我有16GB RAM和一个i5单核2.60Ghz。

有没有办法让我的桌面计算速度更快？

来源

2016-03-21 vagabond

您可能会检查[here]（http://www.r-bloggers.com/bigcor-large-correlation-matrices-in-r/） – akrun

您的主要问题是'use =“pairwise.complete.obs” '。在我的系统上（用12列进行测试），需要花费5倍于use =“everything”的时间。 – Roland

WGCNA软件包中有更快版本的cor功能（用于根据相关性推断基因网络）。在我的3.1 GHz的酷睿i7瓦特/ 16 GB的RAM它可以解决同样的49 X 49矩阵大约快20倍：

mat <- replicate(49, as.numeric(sample(0:50,4000000,rep=TRUE))) 

system.time(
    cor_matrix <- cor(mat, use = "pairwise.complete.obs") 
) 
user system elapsed 
40.391 0.017 40.396 

system.time(
    cor_matrix_w <- WGCNA::cor(mat, use = "pairwise.complete.obs") 
) 
user system elapsed 
1.822 0.468 2.290 

all.equal(cor_matrix, cor_matrix_w) 
[1] TRUE

检查帮助文件在功能上的版本之间的差异的详细信息时，您的数据中含有较多的缺失意见。

来源

2016-03-21 17:39:33

加速R中的相关矩阵计算

回答

相关问题