2014-11-09 23 views
3

我试图在彼此的最大距离内找到(x,y)点对。我认为最简单的做法是生成一个DataFrame并逐个遍历每个点,计算在给定点(x_0,y_0)的距离r内是否有坐标为(x,y)的点。然后,所有的2熊猫:找到最大距离内的点

%pylab inline 
import pandas as pd 

def find_nbrs(low, high, num, max_d): 
    x = random.uniform(low, high, num) 
    y = random.uniform(low, high, num) 
    points = pd.DataFrame({'x':x, 'y':y}) 

    tot_nbrs = 0 

    for i in arange(len(points)): 
     x_0 = points.x[i] 
     y_0 = points.y[i] 

     pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2] 
     tot_nbrs += len(pt_nbrz) 
     plot (pt_nbrz.x, pt_nbrz.y, 'r-') 

    plot (points.x, points.y, 'b.') 
    return tot_nbrs 

print find_nbrs(0, 1, 50, 0.1) 
  1. 先分割发现对的总数,它并不总是找到合适的对(我看是没有标签的规定距离内的点)。

  2. 如果我写plot(..., 'or'),它会突出显示所有要点。这意味着pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]至少返回一个(x,y)。为什么?如果比较结果为False,它不应该返回一个空数组吗?

  3. 如何在熊猫中更优雅地完成上述所有操作?例如,不必遍历每个元素。

+0

纠正我,如果我错了,但你正在做一个O(n)的搜索时我想你想要的是一个O(n^2)搜索。你基本上检查x0:y0,x1:y1,x2:y2之间的距离......当我想你想要做的是检查x0:y0,x0:y1,... x1:y0,x1:y1, x1:y2 .... – Greg 2014-11-09 07:21:30

+0

但是,如果我错了你想要什么,那么这将很适合你http://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated -with-numpy – Greg 2014-11-09 07:25:55

+0

感谢您的链接。尽管有答案,但我在计算如何使用numpy.linalg.norm计算距离时遇到了一些麻烦。在这个例子中,a和b应该是什么格式? 回复:O(n^2),我认为这就是我正在做的事情:即遍历每个数据框元素,并找到满足比较的所有其他元素。这应该确定所有的双胞胎,两次,所以要得到的数字,我只是将最后的分数除以2. – 2014-11-09 18:25:45

回答

7

您正在寻找的功能包含在scipy's spatial distance module中。

下面是一个如何使用它的例子。真正的魔法在squareform(pdist(points))

from scipy.spatial.distance import pdist, squareform 
import numpy as np 
import matplotlib.pyplot as plt 

points = np.random.uniform(-.5, .5, (1000,2)) 

# Compute the distance between each different pair of points in X with pdist. 
# Then, just for ease of working, convert to a typical symmetric distance matrix 
# with squareform. 
dists = squareform(pdist(points)) 

poi = points[4] # point of interest 
dist_min = .1 
close_points = dists[4] < dist_min 

print("There are {} other points within a distance of {} from the point " 
    "({:.3f}, {:.3f})".format(close_points.sum() - 1, dist_min, *poi)) 

There are 27 other points within a distance of 0.1 from the point (0.194, 0.160)

对于可视化的目的:

f,ax = plt.subplots(subplot_kw= 
    dict(aspect='equal', xlim=(-.5, .5), ylim=(-.5, .5))) 
ax.plot(points[:,0], points[:,1], 'b+ ') 
ax.plot(poi[0], poi[1], ms=15, marker='s', mfc='none', mec='g') 
ax.plot(points[close_points,0], points[close_points,1], 
    marker='o', mfc='none', mec='r', ls='') # draw all points within distance 

t = np.linspace(0, 2*np.pi, 512) 
circle = dist_min*np.vstack([np.cos(t), np.sin(t)]).T 
ax.plot((circle+poi)[:,0], (circle+poi)[:,1], 'k:') # Add a visual check for that distance 
plt.show() 

enter image description here