优化R中for循环的性能

我有一个字符向量，并且想要为每对向量值（使用stringdist包）创建一个包含距离矩阵的矩阵。目前，我有嵌套的for循环的实现：优化R中for循环的性能

library(stringdist) 

strings <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic") 
m <- matrix(nrow = length(strings), ncol = length(strings)) 
colnames(m) <- strings 
rownames(m) <- strings 

for (i in 1:nrow(m)) { 
    for (j in 1:ncol(m)) { 
    m[i,j] <- stringdist::stringdist(tolower(rownames(m)[i]), tolower(colnames(m)[j]), method = "lv") 
    } 
}

导致下面的矩阵：

> m 
     Hello Helo Hole Apple Ape New Old System Systemic 
Hello  0 1 3  4 5 4 4  6  7 
Helo   1 0 2  4 4 3 3  6  7 
Hole   3 2 0  3 3 4 2  5  7 
Apple  4 4 3  0 2 5 4  5  7 
Ape   5 4 3  2 0 3 3  5  7 
New   4 3 4  5 3 0 3  5  7 
Old   4 3 2  4 3 3 0  6  8 
System  6 6 5  5 5 5 6  0  2 
Systemic  7 7 7  7 7 7 8  2  0

但是，如果我有 - 例如 - lenght 1000的矢量与许多非独特的价值观，这个矩阵是相当大的（比方说，800行800列）和循环是非常慢。我喜欢优化性能，例如通过使用apply函数，但我不知道如何将上面的代码翻译成apply语法。谁能帮忙？

来源

2014-09-03 Daniel

'apply'也循环，并不见得快于for循环。请参阅http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar – 2014-09-03 12:04:08

代码优化问题应该在CodeReview上提出，而不是StackOverflow http://codereview.stackexchange.com/ – 2016-06-26 16:08:41

由于@hrbrmstr的提示我发现了stringdist包本身提供了称为stringdistmatrix的函数，该函数完成我所要求的操作（请参阅here）。

函数调用很简单：stringdistmatrix(strings, strings)

来源

2014-09-03 12:22:36 Daniel

当使用嵌套循环时，检查outer()是否不适合您是非常有趣的。 outer()是嵌套循环的向量化解决方案;它将向量化的函数应用于前两个参数中元素的每种可能的组合。 as stringdist()对载体有效，你可以简单地做：

library(stringdist) 
strings <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", 
      "Old", "System", "Systemic") 

outer(strings,strings, 
     function(i,j){ 
     stringdist(tolower(i),tolower(j)) 
     })

给你想要的结果。

来源

2014-09-03 12:00:10

以前不知道'外部'功能，但是这也有诀窍！ – Daniel 2014-09-03 12:10:20

下面是一个简单的开始：矩阵是对称的，所以不需要计算对角线下的条目。 m[j][i] = m[i][j]。显然，对角元素都是零，所以没有必要打扰这些。

像这样：

for (i in 1:nrow(m)) { 
    m[i][i] <- 0 
    for (j in (i+1):ncol(m)) { 
    m[i,j] <- stringdist::stringdist(tolower(rownames(m)[i]), tolower(colnames(m)[j]), method = "lv") 
    m[j,i] <- m[i,j] 
    } 
}

来源

2014-09-03 12:01:50 duffymo

Bioconductor的具有stringDist功能，可以为你做这个：

source("http://bioconductor.org/biocLite.R") 
biocLite("Biostrings") 

library(Biostrings) 

stringDist(c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic"), upper=TRUE) 

## 1 2 3 4 5 6 7 8 9 
## 1 1 3 4 5 4 4 6 7 
## 2 1 2 4 4 3 3 6 7 
## 3 3 2 3 3 4 3 5 7 
## 4 4 4 3 2 5 4 5 7 
## 5 5 4 3 2 3 3 5 7 
## 6 4 3 4 5 3 3 5 7 
## 7 4 3 3 4 3 3 6 8 
## 8 6 6 5 5 5 5 6 2 
## 9 7 7 7 7 7 7 8 2

来源

2014-09-03 12:02:30 hrbrmstr

非常感谢我的耻辱：'stringdist'包也有这样一个函数：'stringdistmatrix' – Daniel 2014-09-03 12:09:24

你可以/应该发布它作为答案并且拒绝接受我并接受它（点！）。我最近在脑海里有了“bioconductor”（为infosec构建类似的东西），并且它的答案太过于夸张。 – hrbrmstr 2014-09-03 12:15:22

好的，完成了，但我可以在两天内首先接受我自己的答案。 – Daniel 2014-09-03 12:23:10

优化R中for循环的性能

回答

相关问题