numpy是如何实现多维广播的？

存储器（行优先顺序）：numpy是如何实现多维广播的？

[[A(0,0), A(0,1)] 
[A(1,0), A(1,1)]] 

has this memory layout: 
[A(0,0), A(0,1), A(1,0), A(1,1)]

我想在下列情况下，该算法的工作是这样的。

广播尺寸是最后一个维度：

[[0, 1, 2, 3]   [[1] 
        x 
[4, 5, 6, 7]]   [10]] 

    A (2 by 4)   B (2 by 1) 

Iterate 0th dimensions of A and B simultaneously { 
    Iterate last dimension of A{ 
     multiply; 
    } 
}

广播尺寸为第0尺寸：

[[0, 1, 2, 3] 
        x [[1,10,100,1000]] 
[4, 5, 6, 7]] 

    A (2 by 4)    B (1 by 4) 

Iterate 0th dimension of A{ 
    Iterate 1st dimensions of A and B simultaneously{ 
     multiply; 
    } 
}

问：

如何numpy的知道哪些乘法的顺序是最好的。（读存储器中，以便比读记忆所有的地方更好。但如何做到numpy的数字说出来？）
什么会numpy的做，如果阵列有两个以上的尺寸
什么会NumPy的做如果广播维度不是最后一个维度？

是怎么回事的第二个猜测：

#include <iostream> 
int main(void){ 
    const int nA = 12; 
    const int nB = 3; 
    int A[nA]; 
    int B[nB]; 
    for(int i = 0; i != nA; ++i) A[i] = i+1; 
    for(int i = 0; i != nB; ++i) B[i] = i+1; 
    //dimension 
    int dA[] = {2,3,2}; 
    int dB[] = {1,3,1}; 

    int* pA = A; 
    int* pB = B; 
    int* pA_end = A + nA; 
    //is it possible to make the compiler 
    //generate the iA and sA? 
    int iB = 0; 
    int iB_max = 2; 
    int sB[] = {1,0}; 

    while(pA != pA_end){ 
     std::cout << "*pA, *pB: " << *pA << ", " << *pB <<std::endl; 
     std::cout << "iB: " << iB <<std::endl; 
     *(pA) *= *(pB); 
     ++pA; 
     pB += sB[iB]; 
     ++iB; 
     if (iB == iB_max) {iB = 0; pB = B;} 
    } 

    for(pA = A; pA != pA_end; ++pA){ 
     std::cout << *(pA) << ", "; 
    } 
    std::cout << std::endl;  
}

来源

2016-09-21 rxu

第二猜测似乎是错误的。 numpy似乎使用多索引。 – rxu

真正进入广播细节你需要了解阵列的形状和进步。但是很多工作现在使用nditer在c代码中实现。你可以在http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html阅读。 np.nditer可让您访问Python级别的工具，但其实际价值与cython或您自己的c代码一起使用时会有效。

np.lib.stride_tricks的功能，让你玩大步。其功能之一有助于可视化数组如何一起播出。在实践中，工作与nditer完成，但此功能可以帮助了解行动：

In [629]: np.lib.stride_tricks.broadcast_arrays(np.arange(6).reshape(2,3), 
            np.array([[1],[2]])) 
Out[629]: 
[array([[0, 1, 2], 
     [3, 4, 5]]), 
array([[1, 1, 1], 
     [2, 2, 2]])]

需要注意的是，有效第二阵列已经被复制到匹配第一个的形状。但复制是用步幅技巧完成的，而不是实际的副本。

In [631]: A,B=np.lib.stride_tricks.broadcast_arrays(np.arange(6).reshape(2,3), 
             np.array([[1],[2]])) 
In [632]: A.shape 
Out[632]: (2, 3) 
In [633]: A.strides 
Out[633]: (12, 4) 
In [634]: B.shape 
Out[634]: (2, 3) 
In [635]: B.strides 
Out[635]: (4, 0)

这就是这(4,0)大步进行没有复制的复制。

=================

使用Python水平nditer，这里的广播时它做什么。

In [1]: A=np.arange(6).reshape(2,3) 
In [2]: B=np.array([[1],[2]])

的滑动nditer馈送元件的一组在一个时间 http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#using-an-external-loop

In [5]: it =np.nditer((A,B)) 
In [6]: for a,b in it: 
    ...:  print(a,b) 

0 1 
1 1 
2 1 
3 2 
4 2 
5 2

但是，当我打开extenal_loop上，它在组块，这里广播的阵列的各行迭代：

In [7]: it =np.nditer((A,B), flags=['external_loop']) 
In [8]: for a,b in it: 
    ...:  print(a,b) 

[0 1 2] [1 1 1] 
[3 4 5] [2 2 2]

随着更复杂的广播external_loop仍然产生1D阵列，允许简单c风格迭代：

In [13]: A1=np.arange(24).reshape(3,2,4) 
In [18]: it =np.nditer((A1,np.arange(3)[:,None,None]), flags=['external_loop']) 
In [19]: while not it.finished: 
    ...:  print(it[:]) 
    ...:  it.iternext() 
    ...:  
(array([0, 1, 2, 3, 4, 5, 6, 7]), array([0, 0, 0, 0, 0, 0, 0, 0])) 
(array([ 8, 9, 10, 11, 12, 13, 14, 15]), array([1, 1, 1, 1, 1, 1, 1, 1])) 
(array([16, 17, 18, 19, 20, 21, 22, 23]), array([2, 2, 2, 2, 2, 2, 2, 2]))

注意，尽管A1是（3,2,4），则nditer循环产率的3个步骤（第1轴）与2×4长度的元件。

我发现在另一个cython/nditer SO问题，第一种方法没有产生很大的速度提高，但第二个帮助了很多。在c或cythonexternal_loop的情况下会做简单的低级迭代。

===============

如果我在1和第3轴线广播，迭代器需要2个* 3个步骤（有效地平坦化第一2轴，并馈送在第3次）：

In [20]: it =np.nditer((A1,np.arange(2)[None,:,None]), flags=['external_loop']) 
In [21]: while not it.finished: 
    ...:  print(it[:]) 
    ...:  it.iternext() 
    ...:  
(array([0, 1, 2, 3]), array([0, 0, 0, 0])) 
(array([4, 5, 6, 7]), array([1, 1, 1, 1])) 
(array([ 8, 9, 10, 11]), array([0, 0, 0, 0])) 
(array([12, 13, 14, 15]), array([1, 1, 1, 1])) 
(array([16, 17, 18, 19]), array([0, 0, 0, 0])) 
(array([20, 21, 22, 23]), array([1, 1, 1, 1]))

但随着buffered，它一次迭代，饲养箱2一个维数组：

In [22]: it =np.nditer((A1,np.arange(2)[None,:,None]), flags=['external_loop','buffered']) 
In [23]: while not it.finished: 
    ...:  print(it[:]) 
    ...:  it.iternext() 
    ...:  
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]), 
array([0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]))

Does Cython offer any reasonably easy and efficient way to iterate Numpy arrays as if they were flat? 有一定速度测试中，显示出BU ffered外循环是最快的

cython转化为快速，简单，c重复这样的：

for xarr in it: 
    x = xarr 
    size = x.shape[0] 
    for i in range(size): 
     x[i] = x[i]+1.0

来源

2016-09-21 21:09:18 hpaulj

'np.lib.stride_tricks.broadcast_arrays'更好拼写['np.broadcast_arrays']（https://docs.scipy.org/doc/numpy/reference/generated/numpy.broadcast_arrays.html） – Eric

虽然整个' stride_tricks.py'文件是很好的阅读，如果你需要深入挖掘。 – hpaulj

我尝试阅读nditer c代码，但还不完全理解它。 nditer会牺牲性能吗？ niter是否将嵌套循环视为具有可变跨度的单循环？ – rxu

numpy是如何实现多维广播的？

回答

相关问题