将多维集群绘制成二维图python

我正在对大量数据进行聚类，这些数据有两个不同的聚类。将多维集群绘制成二维图python

第一种是6维聚类，而第二种是12维聚类。现在我决定使用kmeans（因为它似乎是开始时最直观的聚类算法）。

问题是如何将这些簇映射到二维图上，以便我可以推断kmeans是否在工作。我想使用matplotlib，但任何其他的python包都可以。

群集1是这些数据类型（整数，浮点，浮点，整数，浮点，INT）

群集2的由一个簇由12种浮点类型的簇。

试图得到类似于此的输出 enter image description here 任何提示将是有用的。

来源

2014-10-30 chettyharish

我是唯一不知道集群是谁的人吗？ – farenorth 2014-10-30 05:57:57

@ farenorth是的，你是。 – 2014-10-30 10:36:15

回答这个问题：搜索“[matplotlib kmeans示例]（https://duckduckgo.com/?q=matplotlib+kmeans+example）” – 2014-10-30 10:38:16

那么搜索互联网后，得到很多奇怪的评论少解决方案。我能够弄清楚如何做到这一点。如果你正在尝试做类似的事情，这里是代码。它包含来自各种来源的代码，并且它们中的很多都由我编写/编辑。我希望它比其他人更容易理解。

该函数基于scipy中的kmeans2，它返回centroid_list和label_list。 kmeansdata是传递给kmeans2进行聚类的numpy数组，num_clusters表示传递给kmeans2的簇的数量。

该代码写回一个新的PNG文件，确保它不会覆盖别的东西。还绘制了50只集群（如果有集群的1000的，那么不要尝试输出个个）

（它是为python2.7写的，应该对其他版本的工作了，我猜。）

import numpy 
import colorsys 
import random 
import os 
from matplotlib.mlab import PCA as mlabPCA 
from matplotlib import pyplot as plt 


def get_colors(num_colors): 
    """ 
    Function to generate a list of randomly generated colors 
    The function first generates 256 different colors and then 
    we randomly select the number of colors required from it 
    num_colors  -> Number of colors to generate 
    colors   -> Consists of 256 different colors 
    random_colors  -> Randomly returns required(num_color) colors 
    """ 
    colors = [] 
    random_colors = [] 
    # Generate 256 different colors and choose num_clors randomly 
    for i in numpy.arange(0., 360., 360./256.): 
     hue = i/360. 
     lightness = (50 + numpy.random.rand() * 10)/100. 
     saturation = (90 + numpy.random.rand() * 10)/100. 
     colors.append(colorsys.hls_to_rgb(hue, lightness, saturation)) 

    for i in range(0, num_colors): 
     random_colors.append(colors[random.randint(0, len(colors) - 1)]) 
    return random_colors 


def random_centroid_selector(total_clusters , clusters_plotted): 
    """ 
    Function to generate a list of randomly selected 
    centroids to plot on the output png 
    total_clusters  -> Total number of clusters 
    clusters_plotted  -> Number of clusters to plot 
    random_list   -> Contains the index of clusters 
          to be plotted 
    """ 
    random_list = [] 
    for i in range(0 , clusters_plotted): 
     random_list.append(random.randint(0, total_clusters - 1)) 
    return random_list 

def plot_cluster(kmeansdata, centroid_list, label_list , num_cluster): 
    """ 
    Function to convert the n-dimensional cluster to 
    2-dimensional cluster and plotting 50 random clusters 
    file%d.png -> file where the output is stored indexed 
        by first available file index 
        e.g. file1.png , file2.png ... 
    """ 
    mlab_pca = mlabPCA(kmeansdata) 
    cutoff = mlab_pca.fracs[1] 
    users_2d = mlab_pca.project(kmeansdata, minfrac=cutoff) 
    centroids_2d = mlab_pca.project(centroid_list, minfrac=cutoff) 


    colors = get_colors(num_cluster) 
    plt.figure() 
    plt.xlim([users_2d[:, 0].min() - 3, users_2d[:, 0].max() + 3]) 
    plt.ylim([users_2d[:, 1].min() - 3, users_2d[:, 1].max() + 3]) 

    # Plotting 50 clusters only for now 
    random_list = random_centroid_selector(num_cluster , 50) 

    # Plotting only the centroids which were randomly_selected 
    # Centroids are represented as a large 'o' marker 
    for i, position in enumerate(centroids_2d): 
     if i in random_list: 
      plt.scatter(centroids_2d[i, 0], centroids_2d[i, 1], marker='o', c=colors[i], s=100) 


    # Plotting only the points whose centers were plotted 
    # Points are represented as a small '+' marker 
    for i, position in enumerate(label_list): 
     if position in random_list: 
      plt.scatter(users_2d[i, 0], users_2d[i, 1] , marker='+' , c=colors[position]) 

    filename = "name" 
    i = 0 
    while True: 
     if os.path.isfile(filename + str(i) + ".png") == False: 
      #new index found write file and return 
      plt.savefig(filename + str(i) + ".png") 
      break 
     else: 
      #Changing index to next number 
      i = i + 1 
    return

来源

2014-11-04 04:18:40 chettyharish

将多维集群绘制成二维图python

回答

相关问题