2013-07-15 47 views
6

以下代码中的lda.show_topics模块仅打印每个主题前10个单词的分布,我如何打印出语料库中所有单词的完整分布?如何在gensim的LDA主题中打印出单词的完整分布?

from gensim import corpora, models 

documents = ["Human machine interface for lab abc computer applications", 
"A survey of user opinion of computer system response time", 
"The EPS user interface management system", 
"System and human system engineering testing of EPS", 
"Relation of user perceived response time to error measurement", 
"The generation of random binary unordered trees", 
"The intersection graph of paths in trees", 
"Graph minors IV Widths of trees and well quasi ordering", 
"Graph minors A survey"] 

stoplist = set('for a of the and to in'.split()) 
texts = [[word for word in document.lower().split() if word not in stoplist] 
     for document in documents] 

dictionary = corpora.Dictionary(texts) 
corpus = [dictionary.doc2bow(text) for text in texts] 

lda = models.ldamodel.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=2) 

for i in lda.show_topics(): 
    print i 
+0

你可以做哈克的事情,并更改站点包的LDA包(或无论它是您的计算机上),打印所有的人,或复制其代码到你的程序,并改变它打印出全部而不是10个。 – debianplebian

+0

刚刚找到答案,它隐藏在api =中)。找到自己的答案找到答案 – alvas

+0

找到好工作。 – debianplebian

回答

8

有一个变量调用topnshow_topics()在那里你可以指定你的话在分配每个主题需要前N个字的数量。请参见http://radimrehurek.com/gensim/models/ldamodel.html

因此不是默认的lda.show_topics()。您可以使用len(dictionary)为每个主题的全称分布:

for i in lda.show_topics(topn=len(dictionary)): 
    print i 
3

有两个变量调用num_topicsshow_topics()num_words,为num_topics一些话题,返回num_words最显著字(每个主题10个字,由默认)。请参阅http://radimrehurek.com/gensim/models/ldamodel.html#gensim.models.ldamodel.LdaModel.show_topics

因此,您可以将len(lda.id2word)用于每个主题的全部单词分布,以及lda.num_topics用于您的lda模型中的所有主题。

for i in lda.show_topics(formatted=False,num_topics=lda.num_topics,num_words=len(lda.id2word)): 
    print i 
+0

请解释你的答案。 SO不仅仅是回答问题,而是帮助人们学习。代码只有答案被认为是低质量的 – Machavity

0

下面的代码将打印您的单词以及它们的概率。我已经打印了前10个单词。您可以更改num_words = 10,以便按主题打印更多单词。

for words in lda.show_topics(formatted=False,num_words=10): 
    print(words[0]) 
    print("******************************") 
    for word_prob in words[1]: 
     print("(",dictionary[int(word_prob[0])],",",word_prob[1],")",end = "") 
    print("") 
    print("******************************") 
相关问题