向量化numpy的bincount

我有一个2D numpy的阵列中，A我想申请np.bincount()到矩阵A的每列，以产生另一种是由原始矩阵A的每一列的bincounts的2D阵列B。向量化numpy的bincount

我的问题是np.bincount（）是一个类似于1d数组的函数。例如，它不是像B = A.max(axis=1)这样的数组方法。

是否有一个更pythonic/numpythic的方式来产生这个B阵列除了一个讨厌的for循环？

import numpy as np 

states = 4 
rows = 8 
cols = 4 

A = np.random.randint(0,states,(rows,cols)) 
B = np.zeros((states,cols)) 

for x in range(A.shape[1]): 
    B[:,x] = np.bincount(A[:,x])

来源

2016-11-14 user3556757

我会建议使用np.apply_along_axis，这将让你应用1D-方法（在这种情况下np.bincount）到一个更高维数组的1D片：

import numpy as np 

states = 4 
rows = 8 
cols = 4 

A = np.random.randint(0,states,(rows,cols)) 
B = np.zeros((states,cols)) 

B = np.apply_along_axis(np.bincount, axis=0, arr=A)

你必须但要小心。这个（以及你建议的for -loop）只适用于np.bincount的输出形状正确的情况。如果最大状态不存在于数组A的一列或多列中，则输出不会具有较小的维度，因此代码将以ValueError进行归档。

来源

2016-11-14 15:15:06 jotasi

注意apply_along_axis只是语法糖为一个for循环，并具有相同的性能特点。 –

使用numpy_indexed包（免责声明：我是其作者）的此解决方案是完全向量化的，因此不包括幕后的任何python循环。另外，对输入没有限制。不是每一列都需要包含相同的一组唯一值。

import numpy_indexed as npi 
rowidx, colidx = np.indices(A.shape) 
(bin, col), B = npi.count_table(A.flatten(), colidx.flatten())

这给出了一个替代的结果相同的（稀疏）表示，如果B阵列确实含有许多零，其可以是更合适：

(bin, col), count = npi.count((A.flatten(), colidx.flatten()))

注意apply_along_axis只是语法糖for for循环，并具有相同的性能特征。

来源

2016-11-14 16:10:21

用同样的理念在this post，这里有一个量化的方法 -

m = A.shape[1]  
n = A.max()+1 
A1 = A + (n*np.arange(m)) 
out = np.bincount(A1.ravel(),minlength=n*m).reshape(m,-1).T

来源

2016-11-14 16:26:46 Divakar

另一种可能性：

import numpy as np 


def bincount_columns(x, minlength=None): 
    nbins = x.max() + 1 
    if minlength is not None: 
     nbins = max(nbins, minlength) 
    ncols = x.shape[1] 
    count = np.zeros((nbins, ncols), dtype=int) 
    colidx = np.arange(ncols)[None, :] 
    np.add.at(count, (x, colidx), 1) 
    return count

例如，

In [110]: x 
Out[110]: 
array([[4, 2, 2, 3], 
     [4, 3, 4, 4], 
     [4, 3, 4, 4], 
     [0, 2, 4, 0], 
     [4, 1, 2, 1], 
     [4, 2, 4, 3]]) 

In [111]: bincount_columns(x) 
Out[111]: 
array([[1, 0, 0, 1], 
     [0, 1, 0, 1], 
     [0, 3, 2, 0], 
     [0, 2, 0, 2], 
     [5, 0, 4, 2]]) 

In [112]: bincount_columns(x, minlength=7) 
Out[112]: 
array([[1, 0, 0, 1], 
     [0, 1, 0, 1], 
     [0, 3, 2, 0], 
     [0, 2, 0, 2], 
     [5, 0, 4, 2], 
     [0, 0, 0, 0], 
     [0, 0, 0, 0]])

来源

2016-11-14 16:32:38

向量化numpy的bincount

回答

相关问题