获取单词出现次数

我试图用r获得csv文件中每个单词的出现次数。我的数据集是这样的：获取单词出现次数

         TITLE 
1          My first Android app after a year 
2         Unmanned drone buzzes French police car 
3          Make anything editable with HTML5 
4           Predictive vs Reactive control 
5 What was it like to move to San Antonio and go through TechStars Cloud? 
6    Health-care sector vulnerable to hackers, researchers say

而且我一直在使用中使用“机器学习的黑客”的功能可按尝试：

get.tdm <- function(doc.vec) { 
      doc.corpus <- Corpus(VectorSource(doc.vec)) 
      control <- list(stopwords=TRUE, removePunctuation=TRUE, removeNumbers=TRUE, minDocFreq=2) 
      doc.dtm <- TermDocumentMatrix(doc.corpus, control) 
      return(doc.dtm) 
}

，但我得到我不错误理解：

Error: is.Source(s) is not TRUE 
In addition: Warning message: 
In is.Source(s) : vectorized sources must have a positive length entry

可能是什么问题？

来源

2014-04-10 Spearfisher

也许你应该尝试它作为一个'DataframeSource'而不是'VectorSource'：http://www.inside-r.org/packages/cran/tm/docs/DataframeSource –

我现在得到这个： In isSource：表示元素数量的无效长度条目 – Spearfisher

这对我的作品（调用你的数据帧df）

library(tm) 
doc.corpus <- Corpus(VectorSource(df)) 
freq <- data.frame(count=termFreq(doc.corpus[[1]])) 
freq 
#    count 
# after   1 
# and    1 
# android   1 
# antonio   1 
# anything  1 
# ... 
# unmanned  1 
# vulnerable  1 
# was    1 
# what   1 
# with   1 
# year   1

来源

2014-04-10 19:46:22 jlhoward

这个计算第一行每个单词出现次数的工作是有一种方法可以使它适应整个集合中每个单词的数量？ – Spearfisher

对不起。您需要在第一行使用'df'，而不是'df $ TITLE'。查看我的编辑。 – jlhoward

获取单词出现次数

回答

相关问题