我试图用r获得csv文件中每个单词的出现次数。 我的数据集是这样的:获取单词出现次数
TITLE
1 My first Android app after a year
2 Unmanned drone buzzes French police car
3 Make anything editable with HTML5
4 Predictive vs Reactive control
5 What was it like to move to San Antonio and go through TechStars Cloud?
6 Health-care sector vulnerable to hackers, researchers say
而且我一直在使用中使用“机器学习的黑客”的功能可按尝试:
get.tdm <- function(doc.vec) {
doc.corpus <- Corpus(VectorSource(doc.vec))
control <- list(stopwords=TRUE, removePunctuation=TRUE, removeNumbers=TRUE, minDocFreq=2)
doc.dtm <- TermDocumentMatrix(doc.corpus, control)
return(doc.dtm)
}
,但我得到我不错误理解:
Error: is.Source(s) is not TRUE
In addition: Warning message:
In is.Source(s) : vectorized sources must have a positive length entry
可能是什么问题?
也许你应该尝试它作为一个'DataframeSource'而不是'VectorSource':http://www.inside-r.org/packages/cran/tm/docs/DataframeSource –
我现在得到这个: In isSource:表示元素数量的无效长度条目 – Spearfisher