2016-05-07 50 views
1

我试图获取twitter数据并创建一个wordcloud,但是我的代码在创建TermDocumentMatrix时发生错误。我的代码如下R:TermDocumentMatrix - 创建时出错

twitter_search_data <- searchTwitter(searchString = text_to_search 
            ,n = 500) 

twitter_search_text <- sapply(twitter_search_data 
          ,function(x) x$getText()) 

twitter_search_corpus <- Corpus(VectorSource(twitter_search_text)) 

twitter_search_corpus <- tm_map(twitter_search_corpus, stripWhitespace, lazy = TRUE) 

twitter_search_corpus <- tm_map(twitter_search_corpus, content_transformer(tolower), lazy = TRUE) 

twitter_search_corpus <- tm_map(twitter_search_corpus, PlainTextDocument,lazy = TRUE)  

twitter_search_corpus <- tm_map(twitter_search_corpus, removePunctuation, lazy = TRUE) 

twitter_search_corpus <- tm_map(twitter_search_corpus, removeNumbers, lazy = TRUE) 

twitter_search_corpus <- tm_map(twitter_search_corpus, removeWords, c("the", "this", "The", "This", stopwords('english')), lazy = TRUE) 

twitter_search_corpus <- tm_map(twitter_search_corpus, stemDocument, lazy = TRUE) 

# Create Document Term Matrix 
tdm <- as.matrix(TermDocumentMatrix(twitter_search_corpus 
            ,control=list(wordLengths=c(3,Inf)) 
            )) 

在创建TermDocumentMatrix之前没有错误。我得到的错误是如下的mclapply

警告(X $内容[I],功能(d)tm_reduce(d,X $懒$地图)): 计划核心1在用户代码中出现的错误,作业的所有值都将受到影响 mclapply(未命名(内容(x)),termFreq,控件)中的警告: 计划核心1在用户代码中遇到错误,作业的所有值都将受到影响 警告:UseMethod :没有适用于应用于类“try-error”对象的'meta'的方法 堆栈轨迹(最深的第一个): 74:FUN
73:lapply
72:se tNames
71:as.list.VCorpus
70:as.list
69:lapply
68:meta.VCorpus
67:元
66:TermDocumentMatrix.VCorpus
65:TermDocumentMatrix
64: as.matrix
63:observeEventHandler
1:runApp

我已经添加lazy = TRUEcontent_transformer(tolower)但仍然会出现错误。

回答

0

这个问题似乎是与

twitter_search_corpus <- tm_map(twitter_search_corpus, stripWhitespace, lazy = TRUE) 

放置去除均在文本中插入空格标点,数字和文字之后。因此,在创建TermDocumentMatrix之前,删除空格的上述代码需要成为最后一个语句。