2013-02-15 91 views
3

我有数据,一个字符向量(最终我会折叠它,所以我不在乎它是否保持向量或者它被视为单个字符串),一个模式向量和一个替换向量。我希望数据中的每个模式都被其各自的替换替换。我用stringr和for循环完成了它,但是有没有更像R的方法来做到这一点?避免在字符串替换循环?

require(stringr) 
start_string <- sample(letters[1:10], 10) 
my_pattern <- c("a", "b", "c", "z") 
my_replacement <- c("[this was an a]", "[this was a b]", "[this was a c]", "[no z!]") 
str_replace(start_string, pattern = my_pattern, replacement = my_replacement) 
# bad lengths, doesn't work 

str_replace(paste0(start_string, collapse = ""), 
    pattern = my_pattern, replacement = my_replacement) 
# vector output, not what I want in this case 

my_result <- start_string 
for (i in 1:length(my_pattern)) { 
    my_result <- str_replace(my_result, 
     pattern = my_pattern[i], replacement = my_replacement[i]) 
} 
> my_result 
[1] "[this was a c]" "[this was an a]" "e"    "g"    "h"    "[this was a b]" 
[7] "d"    "j"    "f"    "i" 

# This is what I want, but is there a better way? 

就我而言,我知道每个模式最多只会发生一次,但并不是每个模式都会发生。我知道如果模式可能出现多次,我可以使用str_replace_all;我希望解决方案也能提供这种选择。我还想要一个使用my_patternmy_replacement的解决方案,以便它可以作为以这些向量为参数的函数的一部分。

+1

for循环出了什么问题?它们非常适合这种情况,您可以反复修改矢量。 – hadley 2013-02-16 14:38:56

回答

3

我敢打赌,有另一种方式来做到这一点,但我首先想到的是gsubfn

my_repl <- function(x){ 
    switch(x,a = "[this was an a]", 
      b = "[this was a b]", 
      c = "[this was a c]", 
      z = "[this was a z]") 
} 

library(gsubfn)  
start_string <- sample(letters[1:10], 10) 
gsubfn("a|b|c|z",my_repl,x = start_string) 

如果你搜索的列表元素一个可接受的有效名称的模式,这也将工作:

names(my_replacement) <- my_pattern 
gsubfn("a|b|c|z",as.list(my_replacement),start_string) 

编辑

但坦率地说,如果我真的公顷d在我自己的代码中做了很多工作,我可能只是做一个函数包装的for循环。下面是使用subgsub,而不是功能的简单版本,从stringr

vsub <- function(pattern,replacement,x,all = TRUE,...){ 
    FUN <- if (all) gsub else sub 
    for (i in seq_len(min(length(pattern),length(replacement)))){ 
    x <- FUN(pattern = pattern[i],replacement = replacement[i],x,...) 
    } 
    x 
} 

vsub(my_pattern,my_replacement,start_string) 

但当然,那有没有这个是众所周知内置功能的原因之一可能是像这样连续更换不能是非常脆弱的,因为他们是如此依赖顺序:

vsub(rev(my_pattern),rev(my_replacement),start_string) 
[1] "i"           "[this w[this was an a]s [this was an a] c]" 
[3] "[this was an a]"       "g"           
[5] "j"           "d"           
[7] "f"           "[this w[this was an a]s [this was an a] b]" 
[9] "h"           "e"  
+0

谢谢,这绝对避免了循环(所以满足我提到的所有标准),但在实际情况下,我有足够的模式和替换(没有什么巨大的,只有15左右),我宁愿不把它们全部写入switch语句。 – Gregor 2013-02-15 22:53:24

+0

@shujaa还有其他选项,但前提是搜索字符串可以作为列表项名称(请参阅我的编辑)。 – joran 2013-02-15 22:58:30

1

下面是基于gregrexprregmatchesregmatches<-一个选项。请注意,可以匹配的正则表达式的长度是有限制的,所以如果您尝试将太长的模式与它匹配,这将不起作用。

replaceSubstrings <- function(patterns, replacements, X) { 
    pat <- paste(patterns, collapse="|") 
    m <- gregexpr(pat, X) 
    regmatches(X, m) <- 
     lapply(regmatches(X,m), 
       function(XX) replacements[match(XX, patterns)]) 
    X 
} 

## Try it out 
patterns <- c("cat", "dog") 
replacements <- c("tiger", "coyote") 
sentences <- c("A cat", "Two dogs", "Raining cats and dogs") 
replaceSubstrings(patterns, replacements, sentences) 
## [1] "A tiger"     "Two coyotes"    
## [3] "Raining tigers and coyotes"