具有共享数据的Python多处理池

我试图加快使用多处理的多变量定点迭代算法，但是，我正在运行处理共享数据的问题。我的解决方案矢量实际上是一个命名的字典，而不是数字的矢量。矢量的每个元素实际上是使用不同的公式计算的。在高层次，我有这样的算法：具有共享数据的Python多处理池

current_estimate = previous_estimate 
while True: 
for state in all_states: 
    current_estimate[state] = state.getValue(previous_estimate) 
if norm(current_estimate, previous_estimate) < tolerance: 
    break 
else: 
    previous_estimate, current_estimate = current_estimate, previous_estimate

我想并行化for循环部分多处理。 previous_estimate变量是只读的，每个进程只需要写入current_estimate的一个元素。我在重写for循环电流尝试如下：

# Class and function definitions 
class A(object): 
    def __init__(self,val): 
     self.val = val 

    # representative getValue function 
    def getValue(self, est): 
     return est[self] + self.val 

def worker(state, in_est, out_est): 
    out_est[state] = state.getValue(in_est) 

def worker_star(a_b_c): 
    """ Allow multiple arguments for a pool 
     Taken from http://stackoverflow.com/a/5443941/3865495 
    """ 
    return worker(*a_b_c) 

# Initialize test environment 
manager = Manager() 
estimates = manager.dict() 
all_states = [] 
for i in range(5): 
    a = A(i) 
    all_states.append(a) 
    estimates[a] = 0 

pool = Pool(process = 2) 
prev_est = estimates 
curr_est = estimates 
pool.map(worker_star, itertools.izip(all_states, itertools.repeat(prev_est), itertools.repreat(curr_est)))

目前，我正在运行到的问题是，加入到all_states数组中的元素是不相同加入manager.dict()。尝试使用数组元素访问字典元素时，我总是收到key value错误。和调试，我发现没有任何元素是相同的。

print map(id, estimates.keys()) 
>>> [19558864, 19558928, 19558992, 19559056, 19559120] 
print map(id, all_states) 
>>> [19416144, 19416208, 19416272, 19416336, 19416400]

来源

2016-09-22 CoconutBandit

发生这种情况，因为你投入的estimatesDictProxy的对象实际上不是相同的对象为那些住在普通字典。 manager.dict()调用返回DictProxy，它代理访问dict，它实际上生活在一个完全独立的管理进程中。当你插入东西时，它们会被复制并发送到远程进程，这意味着它们将具有不同的身份。

要解决这个问题，你可以在A定义自己__eq__和__hash__功能，described in this question：

class A(object): 
    def __init__(self,val): 
     self.val = val 

    # representative getValue function 
    def getValue(self, est): 
     return est[self] + self.val 

    def __hash__(self): 
     return hash(self.__key()) 

    def __key(self): 
     return (self.val,) 

    def __eq__(x, y): 
     return x.__key() == y.__key()

这意味着在estimates项目关键看UPS将只使用val的价值属性来建立身份和平等，而不是由Python分配的id。

来源

2016-09-22 13:47:31 dano

具有共享数据的Python多处理池

回答

相关问题