2017-10-16 40 views
-1

我正在尝试利用R进行一些基本的文本分析。删除字符串中的特定短语

我有一列包含复杂的数据类型。我希望保留一张单独的表格,我可以使用它从第一个数据列中删除某些短语。

我试过gsubfn但没有任何成功。

例如

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") 
removefields <-c("COURT","BODY CORPORATE") 

为什么

x <- gsubfn(removefields,"",dirtydata) 

不行?

盼望输出

c("JOHN ","@PETER","BOB 22","RUPERT ") 
+0

请包含额外加载的软件包的名称。但你可以尝试'gsub(paste(removefields,collapse =“|”),“”,dirtydata)' – Jimbou

+0

可能重复[如何用R替换多个字符串](https://stackoverflow.com/questions/28285480/how-to-replace-multiple-strings-with-the-same-in-r)或[this one](https://stackoverflow.com/questions/24645390/r-remove-multiple-text-strings -in-data-frame) – Jimbou

回答

0

使用请从下面编辑的代码base R的功能

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") 
removefields <-c("COURT","BODY CORPORATE") 
pastedFields = paste0(removefields,collapse = "|") 
gsub(pastedFields,"",dirtydata) 
+0

你能详细说明吗?我假设你以列表格式获得输出,除了矢量吗?如果是这样,请将您应用的代码行放在数据列中 –

0

试试这个。

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") 
removefields <-c("COURT | BODY CORPORATE") 
x <- gsub(removefields, "", dirtydata) 
0

这可以推广任何你投入removefields和周围串条空格被删除:

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") 
removefields <- c("COURT","BODY CORPORATE") 
removefields <- paste0("\\s+", removefields, "\\s+", collapse = "|") 
x <- gsub(removefields, "", dirtydata) 
0

我们可以使用tm

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") 
removefields <-c("COURT","BODY CORPORATE") 

library(tm) 
removeWords(dirtydata, removefields) 

> removeWords(dirtydata, removefields) 
[1] "JOHN " "@PETER" "BOB 22" "RUPERT " 
相关问题