蟒蛇中L1距离的kmeans

给定一个NxM特征向量作为numpy矩阵。是否有任何可以使用L1距离（曼哈顿距离）的Kmeans算法对其进行聚类的例程？蟒蛇中L1距离的kmeans

2011-06-06 JustInTime

我不认为这是SciPy的提供明确的，但你应该看看下面的例子：

http://projects.scipy.org/scipy/ticket/612

2011-06-06 14:48:50 JoshAdel

有代码 is-it-possible-to-specify-your-own-distance-function-using-scikits-learn-k-means下，它使用任何20奇指标在scipy.spatial.distance中。另请参阅 L1-or-L.5-metrics-for-clustering;你能否用L1与L2评论你的结果？

来源

2011-06-12 09:51:15 denis

这是一个使用L1距离（曼哈顿距离）的Kmeans算法。为了通用性，特征向量被表示为列表，该列表很容易转换为numpy矩阵。

import random 
    #Manhattan Distance 
    def L1(v1,v2): 
     if(len(v1)!=len(v2): 
     print “error” 
     return -1 
     return sum([abs(v1[i]-v2[i]) for i in range(len(v1))]) 

    # kmeans with L1 distance. 
    # rows refers to the NxM feature vectors 
    def kcluster(rows,distance=L1,k=4):# Cited from Programming Collective Intelligence 
     # Determine the minimum and maximum values for each point 
     ranges=[(min([row[i] for row in rows]),max([row[i] for row in rows])) for i in range(len(rows[0]))] 

     # Create k randomly placed centroids 
     clusters=[[random.random()*(ranges[i][1]-ranges[i][0])+ranges[i][0] for i in range(len(rows[0]))] for j in range(k)] 

     lastmatches=None 
     for t in range(100): 
      print 'Iteration %d' % t 
      bestmatches=[[] for i in range(k)] 
      # Find which centroid is the closest for each row 
      for j in range(len(rows)): 
       row=rows[j] 
       bestmatch=0 
       for i in range(k): 
        d=distance(clusters[i],row) 
        if d<distance(clusters[bestmatch],row): 
         bestmatch=i 
       bestmatches[bestmatch].append(j) 
      ## If the results are the same as last time, this is complete 
      if bestmatches==lastmatches: 
       break 
      lastmatches=bestmatches 

      # Move the centroids to the average of their members 
      for i in range(k): 
       avgs=[0.0]*len(rows[0]) 
       if len(bestmatches[i])>0: 
        for rowid in bestmatches[i]: 
         for m in range(len(rows[rowid])): 
          avgs[m]+=rows[rowid][m] 
        for j in range(len(avgs)): 
         avgs[j]/=len(bestmatches[i]) 
        clusters[i]=avgs 
     return bestmatches

来源

2012-04-14 15:17:35 junwangbuaa

蟒蛇中L1距离的kmeans

回答

相关问题