这是落实以各种理由一件棘手的事情。可以毫不费力地实现它,对输入数据没有任何影响,映射和访问模式的(时间)复杂性,但是首先有一个生成器的一个或多个优点消失了。在问题中给出的示例代码中,不必跟踪所有值的优点就会丢失。
如果我们允许纯粹的随机访问模式,那么至少所有的映射值都必须被缓存,这又会失去生成器的内存优势。
在两个假设
- 映射函数是昂贵的,并且仅称为稀疏
- 存取模式是随机的,使得所有的值必须缓存
示例代码中这个问题应该很好。这里有一些稍微不同的代码,它也可以处理发电机作为输入数据。有利的是,不要详尽地构建对象构造的传入数据的副本。
因此,如果传入的数据有__getitem__
方法(索引),那么使用self.elements
字典实现精简缓存封装。如果访问稀少,字典比列表更有效率。如果传入数据没有索引,我们只需要消耗和存储未决映射的传入数据。
代码:
class LazilyMapped:
def __init__(self, fn, iterable):
"""LazilyMapped lazily evaluates a mapping/transformation function on incoming values
Assumes mapping is expensive, and [idx] access is random and possibly sparse.
Still, this may defeat having a memory efficient generator on the incoming data,
because we must remember all read data for potential future access.
Lots of different optimizations could be done if there's more information on the
access pattern. For example, memory could be saved if we knew that the access idx
is monotonic increasing (then a list storage would be more efficient, because
forgetting data is then trivial), or if we'd knew that any index is only accessed
once (we could get rid of the infite cache), or if the access is not sparse, but
random we should be using a list instead of a dict for self.elements.
fn is a filter function, getting one element of iterable and returning a bool,
iterable may be a generator
"""
self.fn = fn
self.sparse_in = hasattr(iterable, '__getitem__')
if self.sparse_in:
self.original_values = iterable
else:
self.iter = iter(iterable)
self.original_idx = 0 # keep track of which index to do next in incoming data
self.original_values = [] # keep track of all incoming values
self.elements = {} # forever remember mapped data
def proceed_to(self, idx):
"""Consume incoming values and store for later mapping"""
if idx >= self.original_idx:
for _ in range(self.original_idx, idx + 1):
self.original_values.append(next(self.iter))
self.original_idx = idx + 1
def __getitem__(self, idx):
if idx not in self.elements:
if not self.sparse_in:
self.proceed_to(idx)
self.elements[idx] = mapped = self.fn(self.original_values[idx])
else:
mapped = self.elements[idx]
return mapped
if __name__ == '__main__':
test_list = [1,2,3,4,5]
dut = LazilyMapped(lambda v: v**2, test_list)
assert dut[0] == 1
assert dut[2] == 9
assert dut[1] == 4
dut = LazilyMapped(lambda v: v**2, (num for num in range(1, 7)))
assert dut[0] == 1
assert dut[2] == 9
assert dut[1] == 4
来源
2014-04-04 20:26:56
cfi
你看着['itertools.islice'(https://docs.python.org/3.4/library/itertools.html#itertools.islice)? – jonrsharpe
@jonrsharpe只是做了,但是'list(islice(map(f,data),6,7))'在所需索引之前的所有元素上调用'f'。因此没有收益。或者你会如何在这里使用'islice'? – Hyperboreus
所以你想随机访问,但也懒惰访问?我不知道有这样的工具。请注意,您的示例仅适用于'lst'作为列表,而不是任意的可迭代。您的需求组合似乎有点不寻常,所以您可能必须自己实现它。 – BrenBarn