2016-03-07 135 views
1

我分两个数组numpy的阵列,XY坐标:比较不同尺寸

basic_pts = np.array([[0, 0], [1, 0], [2, 0], [0, 1], [1, 1], [0, 2]]) 
new_pts = np.array([[2, 2], [2, 1], [0.5, 0.5], [1.5, 0.5]]) 

至于结果,我从阵列希望new_pts只有那些点,即履行存在basic_pts没有点的条件更大的x和y值。因此,结果将是

res_pts = np.array([[2, 2], [2, 1], [1.5, 0.5]]) 

我有工作,但由于带有列表理解的工作是不适合数据量较大的一个解决方案。

x_cond = ([basic_pts[:, 0] > x for x in new_pts[:, 1]]) 
y_cond = ([basic_pts[:, 1] > y for y in new_pts[:, 1]]) 
xy_cond_ = np.logical_and(x_cond, y_cond) 
xy_cond = np.swapaxes(xy_cond_, 0, 1) 
mask = np.invert(np.logical_or.reduce(xy_cond)) 
res_pts = new_pts[mask] 

有没有更好的方法来解决这个问题,只能用numpy和没有列表理解?

回答

1

你可以使用NumPy broadcasting -

# Get xy_cond equivalent after extending dimensions of basic_pts to a 2D array 
# version by "pushing" separately col-0 and col-1 to axis=0 and creating a 
# singleton dimension at axis=1. 
# Compare these two extended versions with col-1 of new_pts. 
xyc = (basic_pts[:,0,None] > new_pts[:,1]) & (basic_pts[:,1,None] > new_pts[:,1]) 

# Create mask equivalent and index into new_pts to get selective rows from it 
mask = ~(xyc).any(0) 
res_pts_out = new_pts[mask] 
+0

这是我的想法为好;不过要注意的是,它最终会创建一个中间数组''(len(basic_pts),len(new_pts))'中间数组,这可能会占用大量内存(OP提到'大量数据') – val

+0

@val是的,这可能是一个问题具有非常大的数据量。感谢您指出了这一点! – Divakar

+1

令人惊叹 - 谢谢。有很多东西要学习...... – Geri

0

作为VAL所指出的,它创建一个中间len(basic_pts)×len(new_pts)阵列可以是太内存密集型的溶液。另一方面,在循环中测试new_pts中的每个点的解决方案可能太耗时。我们可以通过选择一个批量大小ķ和大小ķ批次测试new_pts使用Divakar的解决方案缩小差距:

basic_pts = np.array([[0, 0], [1, 0], [2, 0], [0, 1], [1, 1], [0, 2]]) 
new_pts = np.array([[2, 2], [2, 1], [0.5, 0.5], [1.5, 0.5]]) 
k = 2 
subresults = [] 
for i in range(0, len(new_pts), k): 
    j = min(i + k, len(new_pts)) 
    # Process new_pts[i:j] using Divakar's solution 
    xyc = np.logical_and(
     basic_pts[:, np.newaxis, 0] > new_pts[np.newaxis, i:j, 0], 
     basic_pts[:, np.newaxis, 1] > new_pts[np.newaxis, i:j, 1]) 
    mask = ~(xyc).any(axis=0) 
    # mask indicates which points among new_pts[i:j] to use 
    subresults.append(new_pts[i:j][mask]) 
# Concatenate subresult lists 
res = np.concatenate(subresults) 
print(res) 
# Prints: 
array([[ 2. , 2. ], 
     [ 2. , 1. ], 
     [ 1.5, 0.5]])