2017-06-29 39 views
0

我试图找到一种有效的方法,即使用删除列表中的单词删除输入列表中的一组单词的所有实例。将一个向量中的单词的所有实例替换为第二个向量中指定的单词

vectorOfWordsToRemove <- c('cat', 'monkey', 'wolf', 'mouses') 
vectorOfPhrases <- c('the cat and the monkey walked around the block', 'the wolf and the mouses ate lunch with the monkey', 'this should remain unmodified') 
remove_strings <- function(a, b) { stringr::str_replace_all(a,b, '')} 
remove_strings(vectorOfPhrases, vectorOfWordsToRemove) 

我想为输出

vectorOfPhrases <- c('the and the walked around the block', 'the and the ate lunch with the', 'this should remain unmodified') 

也就是说,在矢量的所有单词的每个实例 - vectorOfWordsToRemove应vectorOfPhrases被淘汰。

我可以用for循环做到这一点,但它很慢,它似乎应该有一个矢量化的方式来有效地做到这一点。

感谢

回答

1

首先是让空字符串的载体,以取代:

vectorOfNothing <- rep('', 4) 

然后使用qdap库替代的载体,以取代模式的载体:

library(qdap) 
vectorOfPhrases <- qdap::mgsub(vectorOfWordsToRemove, 
           vectorOfNothing, 
           vectorOfPhrases) 

> vectorOfPhrases 
[1] "the and the walked around the block" "the and the ate lunch with the"  

[3] "this should remain unmodified" 
1

您可以使用gsubfn()

library(gsubfn) 
replaceStrings <- as.list(rep("", 4)) 
newPhrases <- gsubfn("\\S+", setNames(replaceStrings, vectorOfWordsToRemove), vectorOfPhrases) 

> newPhrases 
[1] "the and the walked around the block" "the and the ate lunch with the"  
[3] "this should remain unmodified" 
相关问题