0
我正在尝试使用双元生成词云。我能够生成前30个区分性词语,但无法在绘图时一起显示单词。我的文字云图像仍然看起来像一个单克云。我使用了以下脚本和sci-kit学习软件包。使用python创建n-gram词云
def create_wordcloud(pipeline):
"""
Create word cloud with top 30 discriminative words for each category
"""
class_labels = numpy.array(['Arts','Music','News','Politics','Science','Sports','Technology'])
feature_names =pipeline.named_steps['vectorizer'].get_feature_names()
word_text=[]
for i, class_label in enumerate(class_labels):
top30 = numpy.argsort(pipeline.named_steps['clf'].coef_[i])[-30:]
print("%s: %s" % (class_label," ".join(feature_names[j]+"," for j in top30)))
for j in top30:
word_text.append(feature_names[j])
#print(word_text)
wordcloud1 = WordCloud(width = 800, height = 500, margin=10,random_state=3, collocations=True).generate(' '.join(word_text))
# Save word cloud as .png file
# Image files are saved to the folder "classification_model"
wordcloud1.to_file(class_label+"_wordcloud.png")
# Plot wordcloud on console
plt.figure(figsize=(15,8))
plt.imshow(wordcloud1, interpolation="bilinear")
plt.axis("off")
plt.show()
word_text=[]
这是我的管道代码
pipeline = Pipeline([
# SVM using TfidfVectorizer
('vectorizer', TfidfVectorizer(max_features = 25000, ngram_range=(2, 2),sublinear_tf=True, max_df=0.95, min_df=2,stop_words=stop_words1)),
('clf', LinearSVC(loss='squared_hinge', penalty='l2', dual=False, tol=1e-3))
])
这些都是我的类别“艺术”
Arts: cosmetics businesspeople, television personality, reality television, television presenters, actors london, film producers, actresses television, indian film, set index, actresses actresses, television actors, century actors, births actors, television series, century actresses, actors television, stand comedian, television personalities, television actresses, comedian actor, stand comedians, film actresses, film actors, film directors
它没有工作。它用(_)替换所有单词而没有任何中断。 – VKB
我编辑了我的答案。你有没有尝试过这样的事情? – CrazyElf
谢谢你的作品。 – VKB