我有一些代码用于基于2D圆形窗口中的相邻值来计算图像中的缺失值。它还使用来自相同位置处的一个或多个时间相邻图像的值(即,在第三维上移动的相同2D窗口)。蟒蛇 - 结合Argsort和蒙版以获得移动窗口中的最近值
对于缺失的每个位置,我需要计算不一定基于整个窗口中所有可用值的值,但只能计算具有值的空间最近的n个单元格(在两个图像/ Z-轴位置),其中n是小于2D窗口中的单元总数的某个值。
当时,计算窗口中的所有内容要快得多,因为我的分类方法是将数据最近的n个单元排序,这是函数最慢的部分,因为每次都必须重复根据窗口坐标的距离不会改变。我不确定这是否必要,并且感觉我必须能够获得一次排序的距离,然后在只选择可用单元的过程中屏蔽这些距离。
这里是我的代码,选择的数据差距小区位置的窗口中使用:
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will go through every gap (zero) location in the data
# for each image and for each gap use slicing to get a window of
# size (radius*2+1, radius*2+1) around it from each image, with the
# gap being at the centre i.e.
# testgaplocation = [radius, radius]
# and the variables testdata, alternatedata below will refer to these
# slices
locations_to_exclude = np.logical_or(circle_template, np.logical_or
(testdata==0, alternatedata==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
available_distances = dist[locations_to_include]
available_data = testdata[locations_to_include]
available_alternates = alternatedata[locations_to_include]
if number_available > number_required:
# In this case we need to find the closest number_required of elements, based
# on distances recorded in dist, from available_data and available_alternates
# Having to repeat this argsort for each gap cell calculation is slow and feels
# like it should be avoidable
sortedDistanceIndices = available_distances.argsort(kind = 'mergesort',axis=None)
requiredIndices = sortedDistanceIndices[0:number_required]
selected_data = np.take(available_data, requiredIndices)
selected_alternates = np.take(available_alternates , requiredIndices)
else:
# we just use available_data and available_alternates as they are...
# now do stuff with the selected data to calculate a value for the gap cell
这工作,但在功能的总时间的一半被采取蒙面的argsort空间距离数据。 (〜900uS总共1.4mS--而且这个功能将运行数百亿次,所以这是一个重要的区别!)
我相信我一定能够做这个argsort一次以外的函数,最初设置空间距离窗口,然后在掩码中包含这些排序索引,以获取第一个howManyToCalculate索引而不必重新进行排序。答案可能涉及将我们提取的各种数据放入一个记录数组中 - 但我无法弄清楚如何,如果是的话。任何人都可以看到我如何使这部分过程更有效率?
你的代码是真的很难看......您可能需要阅读[PEP8(http://legacy.python.org/dev/peps/pep-0008/),并按照它:它有利于与其他Python程序员共享代码。 – Jaime 2014-10-16 16:15:51
我同意Jaime的观点,这很难阅读,特别是代码,但是这些描述也留下了一些解释空间。所以我不会冒险提供一个答案,但这里有一些工具,我会检查,如果我(如果我至少含糊地理解你的问题正确)。 [sklearn.feature_extraction.image.extract_patches](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/image.py#L238)可以让你看到你的补丁,你可以掩盖。它会创建一个副本,所以要小心内存问题。 – eickenberg 2014-10-16 19:22:27
您可能还对看起来完全不相关的函数感兴趣,该函数使用扩张来补偿丢失的值。它不会给你你的确切结果,但可能是一个很好的代理:[nilearn.masking._extrapolate_out_img](https://github.com/nilearn/nilearn/blob/fd7e7a7186dca43d0ef5ebd19990b0751d476bda/nilearn/masking.py#L65) – eickenberg 2014-10-16 19:27:04