2014-02-25 63 views
0

我有两个numpy数组,如下所示。计算两个numpy数组的欧氏距离

X = np.array([-0.34095692,-0.34044722,-0.27155318,-0.21320583,-0.44657865,-0.19587836, -0.29414279, -0.3948753 ,-0.21655774 , -0.34857087]) 
Y = np.array([0.16305762,0.38554548, 0.10412536, -0.57981103, 0.17927523, -0.22612216, -0.34569697, 0.30463137,0.01301744,-0.42661108]) 

这些是10个用户的x和y协调。我需要找到每个用户之间的相似性。 对于如:

x1 = -0.34095692 
y1 = 0.16305762 
x2 = -0.34044722 
y2 = 0.38554548 

Euclidean distance = (|x1-y1|^2 + |x2-y2|^2)^1/2 

所以,最后我希望得到一个矩阵像以下内容:帮助我实现这一目标。

enter image description here

+1

听起来不错。什么是问题? –

+0

@Jonathon Reinhart:我不知道要开始吗?任何帮助? –

+1

叹了口气,你有没有考虑问[Google](http://www.google.com/search?q=numpy+euclidean+distance)?它直接导致你[这个成功回答的问题](http://stackoverflow.com/questions/1401712/calculate-euclidean-distance-with-numpy)。 –

回答

2

使用zip(X, Y)得到坐标对,如果你想获得点之间的欧氏距离,它应该是(|x1-x2|^2+|y1-y2|^2)^0.5,不(|x1-y1|^2 - |x2-y2|^2)^1/2

In [125]: coords=zip(X, Y) 

In [126]: from scipy import spatial 
    ...: dists=spatial.distance.cdist(coords, coords) 

In [127]: dists 
Out[127]: 
array([[ 0.  , 0.22248844, 0.09104884, 0.75377329, 0.10685954, 
     0.41534165, 0.5109039 , 0.15149362, 0.19490308, 0.58971785], 
     [ 0.22248844, 0.  , 0.28973034, 0.9737061 , 0.23197262, 
     0.62852005, 0.73270705, 0.09751671, 0.39258852, 0.81219719], 
     [ 0.09104884, 0.28973034, 0.  , 0.68642072, 0.19047682, 
     0.33880688, 0.45038919, 0.23539542, 0.1064197 , 0.53629553], 
     [ 0.75377329, 0.9737061 , 0.68642072, 0.  , 0.79415038, 
     0.35411306, 0.24770988, 0.90290761, 0.59283795, 0.20443561], 
     [ 0.10685954, 0.23197262, 0.19047682, 0.79415038, 0.  , 
     0.47665258, 0.54665574, 0.13560014, 0.28381556, 0.61376196], 
     [ 0.41534165, 0.62852005, 0.33880688, 0.35411306, 0.47665258, 
     0.  , 0.15477091, 0.56683251, 0.24003205, 0.25201351], 
     [ 0.5109039 , 0.73270705, 0.45038919, 0.24770988, 0.54665574, 
     0.15477091, 0.  , 0.65808357, 0.36700881, 0.09751671], 
     [ 0.15149362, 0.09751671, 0.23539542, 0.90290761, 0.13560014, 
     0.56683251, 0.65808357, 0.  , 0.34181257, 0.73270705], 
     [ 0.19490308, 0.39258852, 0.1064197 , 0.59283795, 0.28381556, 
     0.24003205, 0.36700881, 0.34181257, 0.  , 0.45902146], 
     [ 0.58971785, 0.81219719, 0.53629553, 0.20443561, 0.61376196, 
     0.25201351, 0.09751671, 0.73270705, 0.45902146, 0.  ]]) 

要获得此阵列的上三角,请使用numpy.triu

In [128]: np.triu(dists) 
Out[128]: 
array([[ 0.  , 0.22248844, 0.09104884, 0.75377329, 0.10685954, 
     0.41534165, 0.5109039 , 0.15149362, 0.19490308, 0.58971785], 
     [ 0.  , 0.  , 0.28973034, 0.9737061 , 0.23197262, 
     0.62852005, 0.73270705, 0.09751671, 0.39258852, 0.81219719], 
     [ 0.  , 0.  , 0.  , 0.68642072, 0.19047682, 
     0.33880688, 0.45038919, 0.23539542, 0.1064197 , 0.53629553], 
     [ 0.  , 0.  , 0.  , 0.  , 0.79415038, 
     0.35411306, 0.24770988, 0.90290761, 0.59283795, 0.20443561], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.47665258, 0.54665574, 0.13560014, 0.28381556, 0.61376196], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.  , 0.15477091, 0.56683251, 0.24003205, 0.25201351], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.  , 0.  , 0.65808357, 0.36700881, 0.09751671], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.  , 0.  , 0.  , 0.34181257, 0.73270705], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.  , 0.  , 0.  , 0.  , 0.45902146], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.  , 0.  , 0.  , 0.  , 0.  ]]) 
+0

非常感谢!终于找到了。再次感谢。 :) –

+0

@NilaniAlgiriyage乐于帮忙,np;) – zhangxaochen

2

一小段,没有工作:

A = (X-Y)**2 
p, q = np.meshgrid(np.arange(10), np.arange(10)) 
np.sqrt(A[p]-A[q]) 

编辑:说明

  1. A仅仅是一个预计算的矢量与所有平方差。
  2. 神奇的是np.meshgrid:这个函数的目的是在两个不同的数组中生成所有的值对。这不是最好的解决方案,因为你会得到整个矩阵,但对于你拥有的样本数量来说并不是什么大不了的。生成的值将对应于A的索引。
  3. 指数化部分A[p]也是一种魔法。试着自己去了解它的行为。
  4. 这里矩阵充满了nan但这就是你要求的。真正的欧几里德距离是+,而不是-

p &问:

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) 

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 
    [2, 2, 2, 2, 2, 2, 2, 2, 2, 2], 
    [3, 3, 3, 3, 3, 3, 3, 3, 3, 3], 
    [4, 4, 4, 4, 4, 4, 4, 4, 4, 4], 
    [5, 5, 5, 5, 5, 5, 5, 5, 5, 5], 
    [6, 6, 6, 6, 6, 6, 6, 6, 6, 6], 
    [7, 7, 7, 7, 7, 7, 7, 7, 7, 7], 
    [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], 
    [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]]) 
+0

这很好!我没有检查过这个的准确性。你能解释一下吗?任何方式都有很多nans的权利? –

+0

非常感谢您的详细解答。是的,这应该是+我现在已经更新了这个问题。最后一个问题,我没有得到,这些'nans'是什么意思?(它们更接近或更分离或什么?) –

+0

的差异可能是负面的,'sqrt'会使负数为'nan'。用正确的公式,你不会得到这些'nan's – Kiwi