绘制高度相关的单词以针对特定的兴趣单词

我想绘制单词的最高相关性。例如，我想绘制“鲸鱼”这个词的最高十个相关关系。有人能帮助我解决类似的问题吗？如果有帮助，我安装了RGraphViz。绘制高度相关的单词以针对特定的兴趣单词

s.dir1<-"/PATHTOTEXT/MobyDickTxt" 

s.cor1<-Corpus(DirSource(s.dir1), readerControl=list(reader=readPlain)) 
s.cor1<-tm_map(s.cor1, removePunctuation) 
s.cor1<-tm_map(s.cor1, stripWhitespace) 
s.cor1<-tm_map(s.cor1, tolower) 
s.cor1<-tm_map(s.cor1, removeNumbers) 
s.cor1<-tm_map(s.cor1, removeWords, stopwords("english")) 
tdm1 <- TermDocumentMatrix(s.cor1) 

m1 <- as.matrix(tdm) 
v1 <- sort(rowSums(m), decreasing=TRUE) 
d1 <- data.frame(word = names(v),freq=v)

来源

2013-10-23 user2890975

什么样的图？你必须比这更明确。 –

我真的没有偏好。我正在展示一些研究，涉及查看历史文献中的情感词汇之间的关联。因此，任何能够让观众成员仔细查看关系的事情对我来说都是好事。 – user2890975

那么我会推荐一个dotplot。请使用谷歌福与R和dotplot，并尝试找出你自己的。 –

这里的计算上的话，在语料库中的给定字关联，并绘制那些话和相关性的方法。

获取示例数据...

require(tm) 
data("crude") 
tdm <- TermDocumentMatrix(crude)

计算的相关性并存储在数据帧...

toi <- "oil" # term of interest 
corlimit <- 0.7 # lower correlation bound limit. 
oil_0.7 <- data.frame(corr = findAssocs(tdm, toi, corlimit)[[1]], 
        terms = names(findAssocs(tdm, toi, corlimit)[[1]]))

创建允许ggplot排序的数据帧的一个因素......

oil_0.7$terms <- factor(oil_0.7$terms ,levels = oil_0.7$terms)

绘制图...

require(ggplot2) 
ggplot(oil_0.7, aes(y = terms )) + 
    geom_point(aes(x = corr), data = oil_0.7) + 
    xlab(paste0("Correlation with the term ", "\"", toi, "\""))

enter image description here

来源

2013-11-12 09:30:21 Ben

这个回应启发了qdap的'word_cor'函数的绘图方法。我给你信贷，但作为SO的本。如果您想要使用您的全名，请发送电子邮件给我。 –

代码片段不起作用:( –

它的一个非常小的故障：corr和条款有不同的大小;使用row.names而不是名称的作品，然后我只需要更改名称为第一个变量corr和thats'it :) –

绘制高度相关的单词以针对特定的兴趣单词

回答

相关问题