的Python和NumPy的 - 创造ndarray

动态的，任意的子集，我找了一个通用的方式来做到这一点：的Python和NumPy的 - 创造ndarray

raw_data = np.array(somedata) 
filterColumn1 = raw_data[:,1] 
filterColumn2 = raw_data[:,3] 
cartesian_product = itertools.product(np.unique(filterColumn1), np.unique(filterColumn2)) 
for val1, val2 in cartesian_product: 
    fixed_mask = (filterColumn1 == val1) & (filterColumn2 == val2) 
    subset = raw_data[fixed_mask]

我希望能够使用filterColumns的任何量。所以我想要的是：

filterColumns = [filterColumn1, filterColumn2, ...] 
uniqueValues = map(np.unique, filterColumns) 
cartesian_product = itertools.product(*uniqueValues) 
for combination in cartesian_product: 
    variable_mask = ???? 
    subset = raw_data[variable_mask]

是否有一个简单的语法来做我想要的？否则，我应该尝试一种不同的方法吗？

编辑：这似乎是工作

cartesian_product = itertools.product(*uniqueValues) 
for combination in cartesian_product: 

    variable_mask = True 
    for idx, fc in enumerate(filterColumns): 
     variable_mask &= (fc == combination[idx]) 

    subset = raw_data[variable_mask]

来源

2014-10-03 Joe Bashe

你可以使用numpy.all和索引广播这个

filter_matrix = np.array(filterColumns) 
combination_array = np.array(combination) 
bool_matrix = filter_matrix == combination_array[newaxis, :] #not sure of the newaxis position 
subset = raw_data[bool_matrix]

有做同样的事情。但是更简单的方法，如果你的过滤器是在基体中，特别是通过numpy argsort和numpy roll过的轴。首先，将轴移动到轴线上，直到您将过滤器排列为第一列，然后对它们进行排序并垂直切割阵列以获取矩阵的其余部分。

一般情况下，如果Python中可以避免使用for循环，最好避免它。

更新：

这里是一个没有for循环的完整代码：

import numpy as np 

# select filtering indexes 
filter_indexes = [1, 3] 
# generate the test data 
raw_data = np.random.randint(0, 4, size=(50,5)) 


# create a column that we would use for indexing 
index_columns = raw_data[:, filter_indexes] 

# sort the index columns by lexigraphic order over all the indexing columns 
argsorts = np.lexsort(index_columns.T) 

# sort both the index and the data column 
sorted_index = index_columns[argsorts, :] 
sorted_data = raw_data[argsorts, :] 

# in each indexing column, find if number in row and row-1 are identical 
# then group to check if all numbers in corresponding positions in row and row-1 are identical 
autocorrelation = np.all(sorted_index[1:, :] == sorted_index[:-1, :], axis=1) 

# find out the breakpoints: these are the positions where row and row-1 are not identical 
breakpoints = np.nonzero(np.logical_not(autocorrelation))[0]+1 

# finally find the desired subsets 
subsets = np.split(sorted_data, breakpoints)

另一种实施方法是，索引矩阵转换成字符串矩阵，和逐行，得到在现在独特的索引列上进行分割并如上分割。

对于惯例来说，首先滚动索引矩阵直到它们全部位于矩阵的起始位置，以便上面所做的排序清晰可能会更有趣。

来源

2014-10-03 13:28:47 chiffa

我很想接受你的答案，但不是每个人都可以在他们的脑海中旋转n维矩阵。 ;）换句话说，我不确定如何为我的问题实现此解决方案。我深入研究了argsort和rollaxis文档，但如何将它们应用于获取子集已经超出了我的想象。幸运的是，我的数据不是太大，所以循环很好，尽管我完全同意你的说法，尽可能避免循环。 – 2014-10-03 20:35:25

请更新。实际上，这是我想到的lexsort，而不是argsot，它们都提供了排序索引数组，仅在一个轴的几个单个元素上与一个轴的几个元素相关：D – chiffa 2014-10-03 22:00:24

非常感谢您的详细更新！我现在遵循你的逻辑，并学习了一种更好的方式来思考numpy中的数据操作。你用来获得自相关和断点相当标准的方法吗？看起来，新手很难理解你在没有评论的情况下在代码中做什么。 – 2014-10-05 16:06:56

像这样的事情？

variable_mask = np.ones_like(filterColumns[0])  # select all rows initially 
for column, val in zip(filterColumns, combination): 
    variable_mask &= (column == val) 
subset = raw_data[variable_mask]

来源

2014-10-03 13:23:48 r3m0t

的Python和NumPy的 - 创造ndarray

回答

相关问题