2013-10-22 41 views
0

我是Python的新手,我试图开发一个代码,该代码应该基于名为Pycluster的预定义包执行K-Means集群。一开始,我一直在使用固定数量的集群(n = 10个集群)进行集群,代码工作正常。我尝试扩展一些代码,以便不仅仅制作10个集群,我试图建立一个循环,将所需数量的集群从2增加到10(或更多)。正如我所说,这个问题已经开始了,我对Python完全陌生。 我开发的代码可以追溯到如下所示。我意识到错误从代码行33到49开始。 我真的很感谢提供的任何帮助使代码运行。在Python循环中更新和附加

# -*- coding: utf-8 -*- 
""" 
Created on Mon Oct 21 13:53:40 2013 

@author: Engin 
""" 


from Pycluster import * 
import numpy as np 


#Open the text file containing the stored smart meter data 
d=np.loadtxt("120-RES-195-Normalized.txt", delimiter="\t", skiprows=1, usecols=range(1,49)) 


handle=open("120-RES-195-Normalized.txt") 
record = read(handle) #Store the smart meter data in an array called record. 

cluster_results = np.ones((120, 11)) 
cluster_centroids=np.array([]) 
within_cluster_sum_of_squares=np.ones((1,11)) 
between_cluster_sum_of_squares=np.ones((1,11)) 
distance=[] 

for n in range (1,11): 
    cluster_results[:,n-1], within_cluster_sum_of_squares[:,n-1], optimal_solution_repetition = record.kcluster(nclusters=n, npass=10, method='a', dist='e')  #Performs the K-Means clustering using the defined parameters 
    centroids, cmask = record.clustercentroids(cluster_results[:,n-1], method='a', transpose=0) #Calculates the cluster centroids 
    cluster_centroids=np.append(cluster_centroids,centroids) 

#The following routine stores the cluster numbers and the indices of the elements belonging to each 
#cluster so that the Between Clusters Sum of Squares would be easily calculated. The results will also 
#be easily visualised. 
    from collections import defaultdict 
    cluster_numbers_members = defaultdict(list) 
    for i,item in enumerate(cluster_results[:,n-1]): 
     cluster_numbers_members[item].append(i) 
    cluster_numbers_members = {k:v for k,v in cluster_numbers_members.items() if len(v)>=1} 
    cluster_members=cluster_numbers_members.values() 
    cluster_numbers=cluster_numbers_members.keys() 

    distance[:,n-1]=0 
    between_cluster_sum_of_squares[:,n-1]=0 
    for i in range(0,n): 
     for k in range(0,n): 
      distance[:,n-1] = record.clusterdistance(index1=cluster_members[i], index2=cluster_members[k], method='a', dist='e', transpose=0) 
      between_cluster_sum_of_squares[:,n-1]=between_cluster_sum_of_squares[:,n-1]+distance[:,n-1] 

    WCBCR = within_cluster_sum_of_squares/between_cluster_sum_of_squares 
    print cluster_results[:,n-1] 
    print within_cluster_sum_of_squares[:,n-1] 

print cluster_centroids 

#Arranging cluster centroids in (1X48) vector form 
cluster_tuple=zip(*[iter(cluster_centroids)]*48) 
cluster_array=numpy.array(list(cluster_tuple)) 
+0

_ “有启动的问题,因为正如我所说,我完全新的Python的。” _请提供更多的细节。什么样的问题?你有错误信息吗? – Kevin

+0

嗨@Kevin,我更新了代码,因为我在变量名中有一些错误。在早期版本的代码中,我使用了一些其他变量名称,但必须重新命名它们才能使代码更加清晰和一致。当我试图运行当前(更新)代码时,我不断收到以下错误消息:distance [:,n-1] = 0 TypeError:列表索引必须是整数,而不是元组。在此先感谢您的帮助。 – user2470127

回答

0

更换

[:,n-1] 

[:n-1] or [:(n-1)] # same thing, use whatever you find easier to read 
+0

嗨@ExperimentsWithCode,我试过,但不断收到以下错误代码:ValueError:无法从形状(120)广播输入数组形状(0,11) – user2470127

+0

@ user2470127它是否给你一个行号?我没有看到你从哪里得到'形状'。另外,如果您能够以正确的输入格式提供一些示例数据,我可以测试一些更改。 – ExperimentsWithCode

+0

@ user2470127好的,我相信这些形状是矩阵。我相信你可能会遇到的错误可能是由于第二个矩阵没有实际的内容。形状矩阵(0,11)是'[]',当你尝试在矩阵之间进行计算时会出错。我无法在(120)和(0,11)两个矩阵之间进行任何数学计算,而没有得到以下错误:'ValueError:操作数不能与形状一起广播(120)(0,11)'如果我将第二个矩阵(1,11)而不是(0,11)我能够在它们之间执行操作。 – ExperimentsWithCode