2012-03-23 74 views
8

如何在kthon中绘制kmeans图形输出? 我正在使用PyCluster软件包。 allUserVector是一个n乘m的二维矢量,基本上n个具有m个特征的用户。绘制kmeans输出(PyCluster impl)

import Pycluster as pc 
import numpy as np 

clusterid,error,nfound = pc.kcluster(allUserVector, nclusters=3, transpose=0,npass=1,method='a',dist='e') 
    clustermap, _, _ = pc.kcluster(allUserVector, nclusters=3,         transpose=0,npass=1,method='a',dist='e',) 

centroids, _ = pc.clustercentroids(allUserVector, clusterid=clustermap) 
print centroids 
print clusterid 
print nfound 

欲在其示出了清楚地簇哪些用户是其中cluster.Each用户是米维向量 任何输入的曲线图很好地打印簇?

回答

15

这是一种很难绘制m-维数据。一种方法是通过Principal Component Analysis (PCA)映射到2d空间。一旦我们完成了,我们可以用matplotlib把它们扔到一个plot上(基于this answer)。

import numpy as np 
import matplotlib.pyplot as plt 
from matplotlib import mlab 
import Pycluster as pc 

# make fake user data 
users = np.random.normal(0, 10, (20, 5)) 

# cluster 
clusterid, error, nfound = pc.kcluster(users, nclusters=3, transpose=0, 
             npass=10, method='a', dist='e') 
centroids, _ = pc.clustercentroids(users, clusterid=clusterid) 

# reduce dimensionality 
users_pca = mlab.PCA(users) 
cutoff = users_pca.fracs[1] 
users_2d = users_pca.project(users, minfrac=cutoff) 
centroids_2d = users_pca.project(centroids, minfrac=cutoff) 

# make a plot 
colors = ['red', 'green', 'blue'] 
plt.figure() 
plt.xlim([users_2d[:,0].min() - .5, users_2d[:,0].max() + .5]) 
plt.ylim([users_2d[:,1].min() - .5, users_2d[:,1].max() + .5]) 
plt.xticks([], []); plt.yticks([], []) # numbers aren't meaningful 

# show the centroids 
plt.scatter(centroids_2d[:,0], centroids_2d[:,1], marker='o', c=colors, s=100) 

# show user numbers, colored by their cluster id 
for i, ((x,y), kls) in enumerate(zip(users_2d, clusterid)): 
    plt.annotate(str(i), xy=(x,y), xytext=(0,0), textcoords='offset points', 
       color=colors[kls]) 

如果你想绘制数字以外的东西,只是改变了第一个参数annotate。例如,您可能可以执行用户名或其他操作。

请注意,在这个空间中,簇可能看起来有点“错误”(例如,15看起来接近红色而不是绿色),因为它不是发生聚集的实际空间。在这种情况下,前两个主要组件保留61%的差异:

>>> np.cumsum(users_pca.fracs) 
array([ 0.36920636, 0.61313708, 0.81661401, 0.95360623, 1.  ])