1
我正在研究R中的文本挖掘,这里有几个来自我的语料库的文档,在删除了标点符号,数字,URL和停用词后。在R中完成任务
myStopwords <- setdiff(myStopwords, c("r", "big"))
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)
myCorpus <- tm_map(myCorpus, stripWhitespace)
myCorpusCopy <- myCorpus
for (i in c(1:2, 320))
{
cat(paste0("[", i, "] "))
writeLines(strwrap(as.character(myCorpus[[i]]), 60))
}
[1] examples calling java code r
[2] simulating mapreduce r big data analysis using flights data
rbloggers
[320] r reference card data mining now cran lists many useful r
functions packages data mining applications
在那之后,我想了如下词干,
myCorpus <- tm_map(myCorpus, stemDocument)
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=myCorpusCopy)
当我尝试运行for
循环,它显示NA
,如下
for (i in c(1:2, 320))
{
cat(paste0("[", i, "] "))
writeLines(strwrap(as.character(myCorpus[[i]]), 60))
}
[1] NA
[2] NA
[320] NA
任何想法,我在这里错了吗?