2016-03-12 71 views
-1

的中线的numpy的阵列替换无值我产生数据缺失Python的 - 如何与相邻的元素

A = np.zeros((80,80)) 
for i in range(80): 
    if i%2 == 0: 
     for j in range(80): 
      A[i,j] = None 
      if j%2 == 0: 
       A[i,j] = 50+i+j 
    else: 
     for j in range(80): 
      A[i,j] = None 
      if j%2 != 0: 
       A[i,j] = 50+i+j 

数组,给了我这个下面的截图。

enter image description here

我想要做的是与不是无值以及相邻元素的中间值全部替换“无”的值。 是否有一个简单的方法来做到这一点没有在遍历每个元素去?

+0

也许你可以使用屏蔽数组? http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html – maazza

回答

0

为了简便起见,这显示了如何在与正上方和正下方的元件中值填补。剩余的2的情况下(或6,这取决于如何定义“相邻”)是类似的。

与矩阵A开始,移动“向上”沿垂直轴元素可以

up = np.roll(A, 1, axis=0) 
up[0, :] = np.nan 
>>> up 
array([[ nan, nan, nan, ..., nan, nan, nan], 
     [ 50., nan, 52., ..., nan, 128., nan], 
     [ nan, 52., nan, ..., 128., nan, 130.], 
     ..., 
     [ 126., nan, 128., ..., nan, 204., nan], 
     [ nan, 128., nan, ..., 204., nan, 206.], 
     [ 128., nan, 130., ..., nan, 206., nan]]) 

同样可以发现,

down = np.roll(A, -1, axis=0) 
down[-1, :] = np.nan 
>>> down 
array([[ nan, 52., nan, ..., 128., nan, 130.], 
     [ 52., nan, 54., ..., nan, 130., nan], 
     [ nan, 54., nan, ..., 130., nan, 132.], 
     ..., 
     [ 128., nan, 130., ..., nan, 206., nan], 
     [ nan, 130., nan, ..., 206., nan, 208.], 
     [ nan, nan, nan, ..., nan, nan, nan]]) 

由于这是numpy的,你可以创建一个3D使用这些2(或更多)阵列轻松排列

np.array([up, down]) 

有了这个阵列Y,我们可以采取np.nanmedian沿0轴(这仅仅是每个downup):

np.nanmedian(np.array([up, down]), axis=0) 

为了填补缺失的值在A,使用

A[np.isnan(A)] = np.nanmedian(np.array([up, down]), axis=0)[np.isnan(A)] 

P.S.由于所有的邻居可以把所有的组合轴= 0之间,-1 1,0,1个转移(0,0移是元件本身,这将不被使用无论如何由于该isnan)中找到,则可以用双循环自动创建所有这些二维数组。

0

我基于numba写这样的事情(尽管它是蒙面阵列,但我很快就适应了它的NaN)。

正如Ami Tavory所说,你也可以用一些numpy技巧来做到这一点,但如果你自己编写循环,然后对其进行优化,我会发现它更清晰(如果没有内建的话)。我选择numba因为即使它在某些方面它是非常适合加速这种for循环的限制。

import numba as nb 
import numpy as np 

@nb.njit 
def median_filter_2d(array, filtersize): 
    x = array.shape[0] 
    y = array.shape[1] 
    filter_half = filtersize // 2 
    # Create an empty result 
    res = np.zeros_like(array) 
    # Loop through each pixel 
    for i in range(x): 
     for j in range(y): 
      # If it's not NaN just let it stay 
      if not np.isnan(array[i, j]): 
       res[i, j] = array[i, j] 
      else: 
       # We don't want to go outside the image: 
       start_x = max(0, i - filter_half) 
       end_x = min(x, i + filter_half+1) 
       start_y = max(0, j - filter_half) 
       end_y = min(x, j + filter_half+1) 

       # If you want to use nanmedian uncomment this line and comment everything else following 
       #res[i, j] = np.nanmedian(array[start_x:end_x, start_y:end_y])  

       # Create a temporary array. 
       tmp = np.zeros(filtersize*filtersize) 
       counter = 0 # Counter because we want to know how many not-NaNs are present. 

       # Get all adjacent pixel that are not NaN and insert them 
       for ii in range(start_x, end_x): 
        for jj in range(start_y, end_y): 
         if not np.isnan(array[ii, jj]): 
          tmp[counter] = array[ii, jj] 
          counter += 1 

       # Either do it with np.median but it will be slower 
       #res[i, j] = np.median(tmp[0:counter]) 
       # or use some custom median-function 
       res[i, j] = numba_median_insertionsortbased(tmp[0:counter]) 
    return res 

助手值函数仅仅是一个插入排序基于排序然后返回中间元件或两个中间元件的平均值。

@nb.njit 
def numba_median_insertionsortbased(items): 
    # Insertion sort 
    for i in range(1, len(items)): 
     j = i 
     while j > 0 and items[j] < items[j-1]: 
      items[j], items[j-1] = items[j-1], items[j] 
      j -= 1 
    # Median is the middle element (odd length) or the mean of the two middle elements (even length) 
    if items.size % 2 == 0: 
     return 0.5 * (items[(items.size // 2)-1] + items[(items.size // 2)]) 
    else: 
     return items[(items.size // 2)] 

如果你不想使用numba或不能你可以删除@nb.njit线和纯Python中使用它。它会慢很多,但它仍然可以工作。

对于一个80×80 A我得到numba计时:

1000次循环,最好3:每循环(自定义值)

100环,1.63毫秒最好的3:每圈4.88毫秒(numpy median)

但是对于大型过滤器,numpy中值会稍微快一些,因为他们(希望)比insertionsort有更高级的方法。但是对于你的数组,元素将在临时数组中排序,所以基于插入数据的中位数显然是最合适的。对于真实图像,它可能会比numpy中位数慢。

和纯蟒:

1循环,最好的3:每次循环(自定义中位数)

1循环,最好的3 406个MS:每循环707毫秒(numpy的nanmedian无内循环和临时数组)

1循环,最好的3:832毫秒每环(临时数组numpy的中位数)

作为旁白:我总是觉得它surprisi NG是类似的东西定制的中位数,即使没有numba比对小维的投入numpy的中位数(好吧,他们已经排序所以它插入排序:-)的最好的情况下)更快:

%timeit numba_median_insertionsortbased(np.arange(9)) # without the @nb.njit 
10000 loops, best of 3: 21.7 µs per loop 
%timeit np.median(np.arange(9)) 
10000 loops, best of 3: 123 µs per loop 

,这些连更快的解决方案可以通过numba加速:

%timeit numba_median_insertionsortbased(np.arange(9)) # WITH the @nb.njit 
100000 loops, best of 3: 8.93 µs per loop