2011-11-11 63 views
17

我想在python重新实现一个IDL功能:调整与平均或热病一个numpy的二维数组

http://star.pst.qub.ac.uk/idl/REBIN.html

,其通过一个整数因子进行平均的尺寸减小的2D阵列。

例如:

>>> a=np.arange(24).reshape((4,6)) 
>>> a 
array([[ 0, 1, 2, 3, 4, 5], 
     [ 6, 7, 8, 9, 10, 11], 
     [12, 13, 14, 15, 16, 17], 
     [18, 19, 20, 21, 22, 23]]) 

我想通过取相关样品的平均值,以大小重新调整为(2,3),预期的输出将是:

>>> b = rebin(a, (2, 3)) 
>>> b 
array([[ 3.5, 5.5, 7.5], 
     [ 15.5, 17.5, 19.5]]) 

b[0,0] = np.mean(a[:2,:2]), b[0,1] = np.mean(a[:2,2:4])等等。

我相信我应该重塑一个4维数组,然后在正确的切片上取平均值,但是不能算出算法。你有任何提示吗?

+1

现在才发现这是http://stackoverflow.com/questions/4624112/grouping-2d-numpy-array-in-average的副本,但是我找不到这个在使用stackoverflow中的搜索功能之前。 –

回答

29

下面是基于the answer you've linked(为清楚起见)的例子:

>>> import numpy as np 
>>> a = np.arange(24).reshape((4,6)) 
>>> a 
array([[ 0, 1, 2, 3, 4, 5], 
     [ 6, 7, 8, 9, 10, 11], 
     [12, 13, 14, 15, 16, 17], 
     [18, 19, 20, 21, 22, 23]]) 
>>> a.reshape((2,a.shape[0]//2,3,-1)).mean(axis=3).mean(1) 
array([[ 3.5, 5.5, 7.5], 
     [ 15.5, 17.5, 19.5]]) 

作为一个功能:

def rebin(a, shape): 
    sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1] 
    return a.reshape(sh).mean(-1).mean(1) 
+0

谢谢,我已经在github上创建了实现这个函数的要点,以防其他人需要它:https://gist.github.com/1348792,我还建议'numpy-discussion'将它添加到'numpy '但答案是否定的。 –

+0

他们是否给出了否定答案的理由? –

+0

我认为[this](http://mail.scipy.org/pipermail/numpy-discussion/2011-November/059208.html)是讨论。似乎不是消极的,更多的是缺乏时间或者没有足够的兴趣。 – Evert

7

J.F.塞巴斯蒂安拥有2D合并有很大的答案。这里是他的“热病”功能的版本,对于N维的工作原理:

def bin_ndarray(ndarray, new_shape, operation='sum'): 
    """ 
    Bins an ndarray in all axes based on the target shape, by summing or 
     averaging. 

    Number of output dimensions must match number of input dimensions and 
     new axes must divide old ones. 

    Example 
    ------- 
    >>> m = np.arange(0,100,1).reshape((10,10)) 
    >>> n = bin_ndarray(m, new_shape=(5,5), operation='sum') 
    >>> print(n) 

    [[ 22 30 38 46 54] 
    [102 110 118 126 134] 
    [182 190 198 206 214] 
    [262 270 278 286 294] 
    [342 350 358 366 374]] 

    """ 
    operation = operation.lower() 
    if not operation in ['sum', 'mean']: 
     raise ValueError("Operation not supported.") 
    if ndarray.ndim != len(new_shape): 
     raise ValueError("Shape mismatch: {} -> {}".format(ndarray.shape, 
                  new_shape)) 
    compression_pairs = [(d, c//d) for d,c in zip(new_shape, 
                ndarray.shape)] 
    flattened = [l for p in compression_pairs for l in p] 
    ndarray = ndarray.reshape(flattened) 
    for i in range(len(new_shape)): 
     op = getattr(ndarray, operation) 
     ndarray = op(-1*(i+1)) 
    return ndarray 
1

我试图缩减光栅 - 需要大约6000到2000的尺寸栅格和把它变成一个任意大小的小栅格中在前面的箱子尺寸中对这些值进行了平均。我找到了一个使用SciPy的解决方案,但后来我无法在我使用的共享主机服务上安装SciPy,所以我只是编写了这个函数。有可能有更好的方法来做到这一点,不涉及循环遍历行和列,但这似乎工作。

关于这一点的好处在于旧的行数和列数不必被新的行数和列数整除。

def resize_array(a, new_rows, new_cols): 
    ''' 
    This function takes an 2D numpy array a and produces a smaller array 
    of size new_rows, new_cols. new_rows and new_cols must be less than 
    or equal to the number of rows and columns in a. 
    ''' 
    rows = len(a) 
    cols = len(a[0]) 
    yscale = float(rows)/new_rows 
    xscale = float(cols)/new_cols 

    # first average across the cols to shorten rows  
    new_a = np.zeros((rows, new_cols)) 
    for j in range(new_cols): 
     # get the indices of the original array we are going to average across 
     the_x_range = (j*xscale, (j+1)*xscale) 
     firstx = int(the_x_range[0]) 
     lastx = int(the_x_range[1]) 
     # figure out the portion of the first and last index that overlap 
     # with the new index, and thus the portion of those cells that 
     # we need to include in our average 
     x0_scale = 1 - (the_x_range[0]-int(the_x_range[0])) 
     xEnd_scale = (the_x_range[1]-int(the_x_range[1])) 
     # scale_line is a 1d array that corresponds to the portion of each old 
     # index in the_x_range that should be included in the new average 
     scale_line = np.ones((lastx-firstx+1)) 
     scale_line[0] = x0_scale 
     scale_line[-1] = xEnd_scale 
     # Make sure you don't screw up and include an index that is too large 
     # for the array. This isn't great, as there could be some floating 
     # point errors that mess up this comparison. 
     if scale_line[-1] == 0: 
      scale_line = scale_line[:-1] 
      lastx = lastx - 1 
     # Now it's linear algebra time. Take the dot product of a slice of 
     # the original array and the scale_line 
     new_a[:,j] = np.dot(a[:,firstx:lastx+1], scale_line)/scale_line.sum() 

    # Then average across the rows to shorten the cols. Same method as above. 
    # It is probably possible to simplify this code, as this is more or less 
    # the same procedure as the block of code above, but transposed. 
    # Here I'm reusing the variable a. Sorry if that's confusing. 
    a = np.zeros((new_rows, new_cols)) 
    for i in range(new_rows): 
     the_y_range = (i*yscale, (i+1)*yscale) 
     firsty = int(the_y_range[0]) 
     lasty = int(the_y_range[1]) 
     y0_scale = 1 - (the_y_range[0]-int(the_y_range[0])) 
     yEnd_scale = (the_y_range[1]-int(the_y_range[1])) 
     scale_line = np.ones((lasty-firsty+1)) 
     scale_line[0] = y0_scale 
     scale_line[-1] = yEnd_scale 
     if scale_line[-1] == 0: 
      scale_line = scale_line[:-1] 
      lasty = lasty - 1 
     a[i:,] = np.dot(scale_line, new_a[firsty:lasty+1,])/scale_line.sum() 

    return a 
2

下面是一种使用矩阵乘法来做什么的方法,它不需要新的数组维度来划分旧的数组。

首先我们产生行压缩机矩阵和列矩阵压缩机(我敢肯定有这样做,甚至单独使用numpy的操作的更清洁的方式):

def get_row_compressor(old_dimension, new_dimension): 
    dim_compressor = np.zeros((new_dimension, old_dimension)) 
    bin_size = float(old_dimension)/new_dimension 
    next_bin_break = bin_size 
    which_row = 0 
    which_column = 0 
    while which_row < dim_compressor.shape[0] and which_column < dim_compressor.shape[1]: 
     if round(next_bin_break - which_column, 10) >= 1: 
      dim_compressor[which_row, which_column] = 1 
      which_column += 1 
     elif next_bin_break == which_column: 

      which_row += 1 
      next_bin_break += bin_size 
     else: 
      partial_credit = next_bin_break - which_column 
      dim_compressor[which_row, which_column] = partial_credit 
      which_row += 1 
      dim_compressor[which_row, which_column] = 1 - partial_credit 
      which_column += 1 
      next_bin_break += bin_size 
    dim_compressor /= bin_size 
    return dim_compressor 


def get_column_compressor(old_dimension, new_dimension): 
    return get_row_compressor(old_dimension, new_dimension).transpose() 

...因此,例如,get_row_compressor(5, 3)为您提供:

[[ 0.6 0.4 0. 0. 0. ] 
[ 0. 0.2 0.6 0.2 0. ] 
[ 0. 0. 0. 0.4 0.6]] 

get_column_compressor(3, 2)为您提供:

[[ 0.66666667 0.  ] 
[ 0.33333333 0.33333333] 
[ 0.   0.66666667]] 

然后只需预乘由行压缩机和postmultiply由列压缩机得到压缩矩阵:

def compress_and_average(array, new_shape): 
    # Note: new shape should be smaller in both dimensions than old shape 
    return np.mat(get_row_compressor(array.shape[0], new_shape[0])) * \ 
      np.mat(array) * \ 
      np.mat(get_column_compressor(array.shape[1], new_shape[1])) 

使用此技术,

compress_and_average(np.array([[50, 7, 2, 0, 1], 
           [0, 0, 2, 8, 4], 
           [4, 1, 1, 0, 0]]), (2, 3)) 

产量:

[[ 21.86666667 2.66666667 2.26666667] 
[ 1.86666667 1.46666667 1.86666667]]