Gensim在句子中查找主题

我已经在一个语料库上训练了一个LDA算法，我想要做的是获取每个句子所对应的主题，以便在算法找到的内容之间进行比较和我拥有的标签。Gensim在句子中查找主题

我试图与下面的代码，但结果很糟糕，我觉得题目大量17（也许是体积的25％，应该是接近5％）

感谢您的帮助

# text lemmatized: list of string lemmatized 
dico = Dictionary(texts_lemmatized) 
corpus_lda = [dico.doc2bow(text) for text in texts_lemmatized] 

lda_ = LdaModel(corpus_lda, num_topics=18) 

df_ = pd.DataFrame([]) 
data = [] 

# theme_commentaire = label of the string 
for i in range(0, len(theme_commentaire)): 
    # lda_.get_document_topics() gives the distribution of all topic for a specific sentence 
    algo = max(lda_.get_document_topics(corpus_lda[i]))[0] 
    human = theme_commentaire[i] 
    data.append([str(algo), human]) 

cols = ['algo', 'human'] 
df_ = pd.DataFrame(data, columns=cols) 
df_.head()

来源

2017-05-10 glouis

阅读此相关的SO问题：http://stackoverflow.com/q/42269313/7414759 – stovfl

它不是真正相关的我的问题是关于LDA不TFIDF。我发现我的问题，但它是max（）函数，它操作我的元组列表的键值[（num_topics，probability）]，所以基本上我大部分时间都是17，因为它是最大的关键。 – glouis

解决的评论：

我发现我的问题，但，这是MAX（）函数，它在我的元组名单的键值进行操作[（NUM_TOPICS，概率） ]所以基本上我是我大部分时间得到17分，因为这是最大的关键。 - glouis

来源

2017-05-11 07:37:11 stovfl

Gensim在句子中查找主题

回答

相关问题