2017-06-01 39 views
1

我正在实现K均值聚类算法。到目前为止,这是我:RuntimeError:在cmp中超出最大递归深度:K意味着集群

import copy 
import csv 
import math 
import random 
import sys 


class Centroid(): 
    def __init__(self, coordinates, _id): 
     self.id = _id 
     self.coordinates = coordinates 
     self.elements = [] 

    def __repr__(self): 
     return 'Centroid: ' + str(self.id) 

    @property 
    def count(self): 
     return len(self.elements) 



    def recalculate_coordinates(self): 
     x = [sum(y)/len(y) for y in zip(*self.elements)] 
     self.coordinates = x 

    def reset_elements(self): 
     self.previous_elements = [] 
     for el in self.elements: 
      self.previous_elements.append(el) 
     self.elements = [] 

class Kmeans(): 
    def __init__(self): 
    self.k = int(sys.argv[2]) 
    self.prepare_data() 
    self.iterations = 0 

    def prepare_data(self): 
    filename = sys.argv[1] 
    self.dataset = [] 
    with open(filename, 'rb') as csvfile: 
     reader = csv.reader(csvfile, delimiter=' ') 
     for row in reader: 
      tuplified = tuple(map(float, row)) 
      self.dataset.append(tuplified) 
    self.create_centroids() 

    def create_centroids(self): 
    self.centroids = [] 
    for i in xrange(self.k): 
     chosen = random.choice(self.dataset) 
     cent = Centroid(chosen, i+1) 
     self.centroids.append(cent) 

def main(): 
    k = Kmeans() 
    def iterate(k): 
    k.iterations += 1 
    for item in k.dataset: 
     candidates = [] 
     for centroid in k.centroids: 
      z = zip(item, centroid.coordinates) 
      squares = map(lambda x: (x[0]-x[1])**2, z) 
      added = sum(squares) 
      edistance = math.sqrt(added) 
      candidates.append((centroid, edistance)) 
     winner = min(candidates, key=lambda x: x[1]) 
     winner[0].add_element(item) 
    for centroid in k.centroids: 
     centroid.reset_elements() 
     centroid.recalculate_coordinates() 

    status_list = [] 
    for centroid in k.centroids: 
     boole = sorted(centroid.elements) == sorted(centroid.previous_elements) 
     status_list.append(boole) 

    if False in status_list: 
     iterate(k) 
    print k.centroids 
    print k.iterations 
    iterate(k) 


if __name__ == '__main__': 
    main() 

不过,我不断收到一个错误RuntimeError: maximum recursion depth exceeded in cmp。我尝试了几次重构,但都没有成功。任何人都可以告诉我可能是什么问题。先谢谢你。

+0

一些缩进是错误的,而且有一些相关的代码失踪。我没有看到你向我们展示的任何东西递归。 –

+0

它是在'def iterate'中的第三行。 – theFarkle

+0

异常发生在哪一行? – Billy

回答

0

如果错误是在这条线:

boole = sorted(centroid.elements) == sorted(centroid.previous_elements) 

什么是最有可能发生的是,你有内centroids.elementscentroids.previous_elements循环引用,所以比较操作(均sorted呼叫和==执行)继续循环遍历每个列表。

的这种行为(在Python 3)一个简单的演示:

>>> x = [] 
>>> y = [x] 
>>> x.append(y) 
>>> x == y 
Traceback (most recent call last) 
    .... 
    x == y  
RecursionError: maximum recursion depth exceeded in comparison 
+0

谢谢你是这个问题。不能upvote它虽然因为代表 – theFarkle

+0

但你可以接受:) – Billy

相关问题