记忆到磁盘 - 蟒蛇 - 永久记忆

我有一个函数

def getHtmlOfUrl(url): 
    ... # expensive computation

，很想做一些事情，如：

def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat")

，然后调用getHtmlMemoized（URL），以便为每个URL做昂贵的计算只有一次。

来源

2013-05-09 seguso

只是pickle（或使用json）缓存字典。 – root 2013-05-09 14:02:34

谢谢，但我是一个蟒蛇新手（第二天）。我没有丝毫的想法，你的意思是... – seguso 2013-05-09 14:04:05

好，所以你作为一个新手做什么是在谷歌查找“pickle python”，并回来给我们，如果你有任何问题。 – 2013-05-09 14:04:55

Python提供了一个非常优雅的方式来做到这一点s - 装饰者。基本上，装饰器是一种函数，它包装另一个函数以提供附加功能而不更改函数源代码。你的装饰可以这样写：

import json 

def persist_to_file(file_name): 

    def decorator(original_func): 

     try: 
      cache = json.load(open(file_name, 'r')) 
     except (IOError, ValueError): 
      cache = {} 

     def new_func(param): 
      if param not in cache: 
       cache[param] = original_func(param) 
       json.dump(cache, open(file_name, 'w')) 
      return cache[param] 

     return new_func 

    return decorator

一旦你得到了，用@ -syntax“装饰”的功能，你准备好了。

@persist_to_file('cache.dat') 
def html_of_url(url): 
    your function code...

注意，该装饰被有意简化，可能不适用于所有情况，例如，当源函数接受或返回不能JSON序列化的数据。

更多关于装饰：How to make a chain of function decorators?

下面是如何使装饰保存缓存只有一次，在退出时间：

import json, atexit 

def persist_to_file(file_name): 

    try: 
     cache = json.load(open(file_name, 'r')) 
    except (IOError, ValueError): 
     cache = {} 

    atexit.register(lambda: json.dump(cache, open(file_name, 'w'))) 

    def decorator(func): 
     def new_func(param): 
      if param not in cache: 
       cache[param] = func(param) 
      return cache[param] 
     return new_func 

    return decorator

来源

2013-05-09 14:44:21 georg

每次更新缓存时都会写入一个新文件 - 取决于使用情况，这可能会（或可能不会）打败加速你从记忆中获得...... – root 2013-05-09 14:58:14

它*还*包含一个非常好的竞争条件，如果这个装饰器同时使用，或者（更可能）以可重入的方式使用。如果'a（）'和'b（）'都被记忆，并且'a（）'调用'b（）'，缓存可以被读取为'a（）'，然后再' ，第一个b的结果会被记忆，但是从调用过来的旧缓存会覆盖它，b对缓存的贡献将会丢失。 – SingleNegationElimination 2013-05-09 15:03:58

@root：当然，'atexit'可能是刷新缓存的好地方。另一方面，增加过早的优化可能会破坏此代码的教育目的。 – georg 2013-05-09 15:25:27

这样的事情应该做的事：

import json 

class Memoize(object): 
    def __init__(self, func): 
     self.func = func 
     self.memo = {} 

    def load_memo(filename): 
     with open(filename) as f: 
      self.memo.update(json.load(f)) 

    def save_memo(filename): 
     with open(filename, 'w') as f: 
      json.dump(self.memo, f) 

    def __call__(self, *args): 
     if not args in self.memo: 
      self.memo[args] = self.func(*args) 
     return self.memo[args]

基本用法：

your_mem_func = Memoize(your_func) 
your_mem_func.load_memo('yourdata.json') 
# do your stuff with your_mem_func

如果你想使用它后写你“缓存”到一个文件 - 需要重新加载在未来：

your_mem_func.save_memo('yournewdata.json')

来源

2013-05-09 14:12:51 root

它看起来不错。但如何使用这个类？对不起... – seguso 2013-05-09 14:21:24

@seguso - 更新了使用情况。更多关于memoization：http://stackoverflow.com/questions/1988804/what-is-memoization-and-how-can-i-use-it-in-python – root 2013-05-09 14:31:15

-1使用不需要的类时 – Merlin 2013-05-09 14:42:07

退房joblib.Memory。这是一个完全做到这一点的图书馆。

来源

2014-05-07 19:33:13 Will

只需三行代码！ :) – Andrew 2017-10-05 03:52:08

哇，这是一个伟大的图书馆！我无法相信我没有joblib这么多年。这应该是IMO的正确答案。 – foobarbecue 2017-10-29 04:31:55

Artemis library有一个这个模块。（你需要pip install artemis-ml）

你装饰你的函数：

from artemis.fileman.disk_memoize import memoize_to_disk 

@memoize_to_disk 
def fcn(a, b, c = None): 
    results = ... 
    return results

在内部，它通过这个哈希使哈希出的输入参数和保存备忘文件。

来源

2015-06-29 12:33:07 Peter

假设你的数据JSON序列化，这个代码应工作

import os, json 

def json_file(fname): 
    def decorator(function): 
     def wrapper(*args, **kwargs): 
      if os.path.isfile(fname): 
       with open(fname, 'r') as f: 
        ret = json.load(f) 
      else: 
       with open(fname, 'w') as f: 
        ret = function(*args, **kwargs) 
        json.dump(ret, f) 
      return ret 
     return wrapper 
    return decorator

装饰getHtmlOfUrl，然后简单地调用它，如果它之前已经运行，你会得到你的缓存数据。

经过与Python 2.x和蟒3.x的

来源

2017-03-06 12:02:57

一个清洁溶液搭载Python的搁置模块。优点是缓存实时更新，这是例外证明。

import shelve 
def shelve_it(file_name): 
    d = shelve.open(file_name) 

    def decorator(func): 
     def new_func(param): 
      if param not in d: 
       d[param] = func(param) 
      return d[param] 

     return new_func 

    return decorator 

@shelve_it('cache.shelve') 
def expensive_funcion(param): 
    pass

这将有助于函数被计算一次。接下来的后续调用相同的参数将返回存储的结果。

来源

2017-11-20 06:07:39 nehemiah

记忆到磁盘 - 蟒蛇 - 永久记忆

回答

相关问题