我如何绘制matplotlib的Kmeans文本聚类结果？

我有下面的代码，用scikit学习一些示例文本。我如何绘制matplotlib的Kmeans文本聚类结果？

train = ["is this good?", "this is bad", "some other text here", "i am hero", "blue jeans", "red carpet", "red dog", "blue sweater", "red hat", "kitty blue"] 

vect = TfidfVectorizer() 
X = vect.fit_transform(train) 
clf = KMeans(n_clusters=3) 
clf.fit(X) 
centroids = clf.cluster_centers_ 

plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=80, linewidths=5) 
plt.show()

我无法弄清楚的事情是我如何绘制聚集的结果。 X是一个csr_matrix。我想要的是（x，y）协调每个结果绘图。

泰

来源

2017-04-21 Anthony De Meulemeester

你的TF-IDF矩阵最终被3×17，所以你需要做一些投影或降维得到质心的两个维度。你有几个选择;这里是与T-SNE的例子：

import matplotlib.pyplot as plt 
from sklearn.cluster import KMeans 
from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.manifold import TSNE 

train = ["is this good?", "this is bad", "some other text here", "i am hero", "blue jeans", "red carpet", "red dog", 
    "blue sweater", "red hat", "kitty blue"] 

vect = TfidfVectorizer() 
X = vect.fit_transform(train) 
clf = KMeans(n_clusters=3) 
data = clf.fit(X) 
centroids = clf.cluster_centers_ 

tsne_init = 'pca' # could also be 'random' 
tsne_perplexity = 20.0 
tsne_early_exaggeration = 4.0 
tsne_learning_rate = 1000 
random_state = 1 
model = TSNE(n_components=2, random_state=random_state, init=tsne_init, perplexity=tsne_perplexity, 
     early_exaggeration=tsne_early_exaggeration, learning_rate=tsne_learning_rate) 

transformed_centroids = model.fit_transform(centroids) 
print transformed_centroids 
plt.scatter(transformed_centroids[:, 0], transformed_centroids[:, 1], marker='x') 
plt.show()

在您的例子，如果你使用PCA来初始化你的T-SNE你得到广泛间隔重心;如果你使用随机初始化，你会得到微小的质心和一个无趣的图片。

来源

2017-08-04 15:03:43

我如何绘制matplotlib的Kmeans文本聚类结果？

回答

相关问题