2017-02-06 180 views
2

(编辑:我写了一个解决方案基础上hpaulj的回答,请参阅代码在文章底部)索引使用切片的numpy的数组numpy的阵列

我写了细分功能的n维将数组排列成较小的数组,使得每个子部分总共具有max_chunk_size个元素。

因为我需要细分许多相同形状的数组,然后在相应的块上执行操作,它实际上不会对数据进行操作,而不会创建“索引器”数组,即i。即一组(slice(x1, x2), slice(y1, y2), ...)对象(请参阅下面的代码)。有了这些索引器,我可以通过调用the_array[indexer[i]]来检索细分(请参阅下面的示例)。另外,这些索引器的数组具有与输入相同的维数,并且分割沿着对应的轴对齐,即,即块the_array[indexer[i,j,k]]the_array[indexer[i+1,j,k]]沿0轴adjusent等

我期待,我也应该能够通过调用the_array[indexer[i:i+2,j,k]]来连接这些块和the_array[indexer]将返回刚刚the_array,然而,这样的调用导致的错误:

IndexError: arrays used as indices must be of integer (or boolean) type

有没有简单的方法来解决这个错误?

下面的代码:

import numpy as np 
import itertools 

def subdivide(shape, max_chunk_size=500000): 
    shape = np.array(shape).astype(float) 
    total_size = shape.prod() 

    # calculate maximum slice shape: 
    slice_shape = np.floor(shape * min(max_chunk_size/total_size, 1.0)**(1./len(shape))).astype(int) 

    # create a list of slices for each dimension: 
    slices = [[slice(left, min(right, n)) \ 
     for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \ 
     for n, step_size in zip(shape.astype(int), slice_shape)] 

    result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object) 
    for i, el in enumerate(itertools.product(*slices)): result[i] = el 
    result.shape = np.ceil(shape/slice_shape).astype(int) 
    return result 

下面是一个例子用法:

>>> ar = np.arange(90).reshape(6,15) 
>>> ar 
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], 
     [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], 
     [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 
     [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74], 
     [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]]) 

>>> slices = subdivide(ar.shape, 16) 
>>> slices 
array([[(slice(0, 2, None), slice(0, 6, None)), 
     (slice(0, 2, None), slice(6, 12, None)), 
     (slice(0, 2, None), slice(12, 15, None))], 
     [(slice(2, 4, None), slice(0, 6, None)), 
     (slice(2, 4, None), slice(6, 12, None)), 
     (slice(2, 4, None), slice(12, 15, None))], 
     [(slice(4, 6, None), slice(0, 6, None)), 
     (slice(4, 6, None), slice(6, 12, None)), 
     (slice(4, 6, None), slice(12, 15, None))]], dtype=object) 

>>> ar[slices[1,0]] 
array([[30, 31, 32, 33, 34, 35], 
     [45, 46, 47, 48, 49, 50]]) 
>>> ar[slices[0,2]] 
array([[12, 13, 14], 
     [27, 28, 29]]) 
>>> ar[slices[2,1]] 
array([[66, 67, 68, 69, 70, 71], 
     [81, 82, 83, 84, 85, 86]]) 

>>> ar[slices[:2,1:3]] 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
IndexError: arrays used as indices must be of integer (or boolean) type 

下面是基于hpaulj的回答的溶液:

import numpy as np 
import itertools 

class Subdivision(): 
    def __init__(self, shape, max_chunk_size=500000): 
     shape = np.array(shape).astype(float) 
     total_size = shape.prod() 

     # calculate maximum slice shape: 
     slice_shape = np.floor(shape * min(max_chunk_size/total_size, 1.0)**(1./len(shape))).astype(int) 

     # create a list of slices for each dimension: 
     slices = [[slice(left, min(right, n)) \ 
      for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \ 
      for n, step_size in zip(shape.astype(int), slice_shape)] 

     self.slices = \ 
      np.array(list(itertools.product(*slices)), \ 
        dtype=np.object).reshape(tuple(np.ceil(shape/slice_shape).astype(int)) + (len(shape),)) 

    def __getitem__(self, args): 
     if type(args) != tuple: args = (args,) 

     # turn integer index into equivalent slice 
     args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args) 

     # select the slices 
     # always select all elements from the last axis (which contains slices for each data dimension) 
     slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))] 

     return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \ 
                 [0] * (len(slices.shape) - 2 - i) + [i])])] \ 
           for i in range(len(slices.shape) - 1))) 

实例:

>>> ar = np.arange(90).reshape(6,15) 
>>> ar 
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], 
     [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], 
     [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 
     [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74], 
     [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]]) 

>>> subdiv = Subdivision(ar.shape, 16) 
>>> ar[subdiv[...]] 
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], 
     [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], 
     [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 
     [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74], 
     [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]]) 

>>> ar[subdiv[0]] 
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]]) 

>>> ar[subdiv[:2,1]] 
array([[ 6, 7, 8, 9, 10, 11], 
     [21, 22, 23, 24, 25, 26], 
     [36, 37, 38, 39, 40, 41], 
     [51, 52, 53, 54, 55, 56]]) 

>>> ar[subdiv[2,:3]] 
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74], 
     [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]]) 

>>> ar[subdiv[...,:2]] 
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 
     [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26], 
     [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41], 
     [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56], 
     [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71], 
     [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]]) 

回答

3

你的切片产生2x6和2x3阵列。我的numpy版本希望我把subslice变成一个元组。这与

ar[slice(0,2), slice(6,12)] 
ar[:2, 6:12] 

这只是索引和切片的基本语法。 ar是2d,因此ar[(i,j)]需要一个2元素元组 - 分片,列表,数组或整数。它不适用于一系列切片对象。

如何将结果连接成更大的数组。这可以在索引之后完成,也可以将切片转换为索引列表。

np.bmat例如串接在一起阵列的2D arangement:

In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]], 
        [ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]]) 
Out[42]: 
matrix([[ 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [21, 22, 23, 24, 25, 26, 27, 28, 29], 
     [36, 37, 38, 39, 40, 41, 42, 43, 44], 
     [51, 52, 53, 54, 55, 56, 57, 58, 59]]) 

你可以概括这一点。它只在嵌套列表上使用hstackvstack。结果是np.matrix,但可以转换回array

另一种方法是使用工具如np.arangenp.r_,np.xi_来创建索引数组。这需要一些游戏来生成一个例子。

为了组合[0,0]和[0,1]子切片:

In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]] 
In [65]: i = np.r_[subslice[0,0,0]] 

In [66]: i,j 
Out[66]: (array([0, 1]), array([ 6, 7, 8, 9, 10, 11, 12, 13, 14])) 
In [68]: ix = np.ix_(i,j) 
In [69]: ix 
Out[69]: 
(array([[0], 
     [1]]), array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14]])) 

In [70]: ar[ix] 
Out[70]: 
array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [21, 22, 23, 24, 25, 26, 27, 28, 29]]) 

或者与i = np.r_[subslice[0,0,0], subslice[1,0,0]]ar[np.ix_(i,j)]产生4×9阵列。

+0

感谢您的回答!我用'np.r_'和'np.xi_'的建议来创建一个类并定义它的'__getitem__'方法来返回所需的索引数组(参见更新后的OP)。 – SiLiKhon