2014-04-03 68 views
1

在python中,我知道两个懒惰的“容器”:生成器和<class 'map'>可下载的懒惰映射

两者都不是可以下标的。所以map(f, data)[1](f(x) for x in data)[1]将会失败。

在python中是否有一个支持下标的惰性映射类?

如果没有什么是最接近的匹配?

我一直在寻找functools无济于事(或我没有发现它)。

基本上我寻找的是这样的(但重新发明轮子应该是最后的选择):

class lazilymappedlist: 
    def __init__ (self, f, lst): 
     self.f = f 
     self.data = [ (False, e) for e in lst] 

    def __getitem__ (self, idx): 
     status, elem = self.data [idx] 
     if status: return elem 
     elem = self.f (elem) 
     self.data [idx] = (True, elem) 
     return elem 
+0

你看着['itertools.islice'(https://docs.python.org/3.4/library/itertools.html#itertools.islice)? – jonrsharpe

+0

@jonrsharpe只是做了,但是'list(islice(map(f,data),6,7))'在所需索引之前的所有元素上调用'f'。因此没有收益。或者你会如何在这里使用'islice'? – Hyperboreus

+0

所以你想随机访问,但也懒惰访问?我不知道有这样的工具。请注意,您的示例仅适用于'lst'作为列表,而不是任意的可迭代。您的需求组合似乎有点不寻常,所以您可能必须自己实现它。 – BrenBarn

回答

1

这是落实以各种理由一件棘手的事情。可以毫不费力地实现它,对输入数据没有任何影响,映射和访问模式的(时间)复杂性,但是首先有一个生成器的一个或多个优点消失了。在问题中给出的示例代码中,不必跟踪所有值的优点就会丢失。

如果我们允许纯粹的随机访问模式,那么至少所有的映射值都必须被缓存,这又会失去生成器的内存优势。

在两个假设

  • 映射函数是昂贵的,并且仅称为稀疏
  • 存取模式是随机的,使得所有的值必须缓存

示例代码中这个问题应该很好。这里有一些稍微不同的代码,它也可以处理发电机作为输入数据。有利的是,不要详尽地构建对象构造的传入数据的副本。

因此,如果传入的数据有__getitem__方法(索引),那么使用self.elements字典实现精简缓存封装。如果访问稀少,字典比列表更有效率。如果传入数据没有索引,我们只需要消耗和存储未决映射的传入数据。

代码:

class LazilyMapped: 
    def __init__(self, fn, iterable): 
     """LazilyMapped lazily evaluates a mapping/transformation function on incoming values 

     Assumes mapping is expensive, and [idx] access is random and possibly sparse. 
     Still, this may defeat having a memory efficient generator on the incoming data, 
     because we must remember all read data for potential future access. 

     Lots of different optimizations could be done if there's more information on the 
     access pattern. For example, memory could be saved if we knew that the access idx 
     is monotonic increasing (then a list storage would be more efficient, because 
     forgetting data is then trivial), or if we'd knew that any index is only accessed 
     once (we could get rid of the infite cache), or if the access is not sparse, but 
     random we should be using a list instead of a dict for self.elements. 

     fn is a filter function, getting one element of iterable and returning a bool, 
     iterable may be a generator 
     """ 
     self.fn = fn 
     self.sparse_in = hasattr(iterable, '__getitem__') 
     if self.sparse_in: 
      self.original_values = iterable 
     else: 
      self.iter = iter(iterable) 
      self.original_idx = 0  # keep track of which index to do next in incoming data 
      self.original_values = [] # keep track of all incoming values 
     self.elements = {}  # forever remember mapped data 
    def proceed_to(self, idx): 
     """Consume incoming values and store for later mapping""" 
     if idx >= self.original_idx: 
      for _ in range(self.original_idx, idx + 1): 
       self.original_values.append(next(self.iter)) 
      self.original_idx = idx + 1 
    def __getitem__(self, idx): 
     if idx not in self.elements: 
      if not self.sparse_in: 
       self.proceed_to(idx) 
      self.elements[idx] = mapped = self.fn(self.original_values[idx]) 
     else: 
      mapped = self.elements[idx] 
     return mapped 

if __name__ == '__main__': 
    test_list = [1,2,3,4,5] 
    dut = LazilyMapped(lambda v: v**2, test_list) 
    assert dut[0] == 1 
    assert dut[2] == 9 
    assert dut[1] == 4 

    dut = LazilyMapped(lambda v: v**2, (num for num in range(1, 7))) 
    assert dut[0] == 1 
    assert dut[2] == 9 
    assert dut[1] == 4 
+0

谢谢你的非常详细的答案。给我时间去消化它。 – Hyperboreus