1

我试图围绕每个质心绘制圆,并将半径延伸到属于每个群集的最远点。现在我的圆圈半径从聚类中心访问由KMeans群集分组的数据的有效方式

这里延伸到一点,在整个训练数据集最远的绘制是我的代码:

def KMeansModel(n): 
    pca = PCA(n_components=2) 
    reduced_train_data = pca.fit_transform(train_data) 
    KM = KMeans(n_clusters=n) 
    KM.fit(reduced_train_data) 
    plt.plot(reduced_train_data[:, 0], reduced_train_data[:, 1], 'k.', markersize=2) 
    centroids = KM.cluster_centers_ 
    # Plot the centroids as a red X 
    plt.scatter(centroids[:, 0], centroids[:, 1], 
       marker='x', color='r') 
    for i in centroids: 
     print np.max(metrics.pairwise_distances(i, reduced_train_data)) 
     plt.gca().add_artist(plt.Circle(i, np.max(metrics.pairwise_distances(i, reduced_train_data)), fill=False)) 
    plt.show() 

out = [KMeansModel(n) for n in np.arange(1,16,1)] 

回答

2

当你

metrics.pairwise_distances(i, reduced_train_data) 

你计算所有训练点的距离,而不仅仅是相关课程的训练点。为了找到对应于ind类训练数据点的位置,你可以做

np.where(KM.labels_==ind)[0] 

因此,内部的for循环

for i in centroids: 

您需要访问到培训点来自特定班级。这将做的工作:

from sklearn.decomposition import PCA 
from sklearn.cluster import KMeans 
from sklearn import metrics 
import matplotlib.pyplot as plt 
import numpy as np 

def KMeansModel(n): 
    pca = PCA(n_components=2) 
    reduced_train_data = pca.fit_transform(train_data) 
    KM = KMeans(n_clusters=n) 
    KM.fit(reduced_train_data) 
    plt.plot(reduced_train_data[:, 0], reduced_train_data[:, 1], 'k.', markersize=2) 
    centroids = KM.cluster_centers_ 
    # Plot the centroids as a red X 
    plt.scatter(centroids[:, 0], centroids[:, 1], 
       marker='x', color='r') 
    for ind,i in enumerate(centroids): 
     class_inds=np.where(KM.labels_==ind)[0] 
     max_dist=np.max(metrics.pairwise_distances(i, reduced_train_data[class_inds])) 
     print(max_dist) 
     plt.gca().add_artist(plt.Circle(i, max_dist, fill=False)) 
    plt.show() 

out = [KMeansModel(n) for n in np.arange(1,16,1)] 

而这就是我得到使用的代码的人物之一:

enter image description here