2011-03-18 180 views
1

我试图构建一个工作进程池(使用mutiprocessing.Pool)的跨越大型数据集的python脚本。Python 2.6:使用多处理时处理本地存储。池

我希望每个进程都有一个独特的对象,可以在该进程的多个执行中使用。

Psudo代码:

def work(data): 
    #connection should be unique per process 
    connection.put(data) 
    print 'work done with connection:', connection 

if __name__ == '__main__': 
    pPool = Pool() # pool of 4 processes 
    datas = [1..1000] 
    for process in pPool: 
     #this is the part i'm asking about // how do I really do this? 
     process.connection = Connection(conargs) 
    for data in datas: 
     pPool.apply_async(work, (data)) 

回答

1

我觉得这样的事情应该工作(未测试)

def init(*args): 
    global connection 
    connection = Connection(*args) 
pPool = Pool(initializer=init, initargs=conargs) 
+0

谢谢,这是关键。 – 2011-03-18 19:00:53

+0

你能标记它作为答案吗? – 2011-03-18 19:39:11

-1

你想拥有驻留在共享内存对象,对不对?

Python has在标准库中有一些支持,但它有点不好。据我记得,只有整数和其他一些基本类型可以存储。

尝试POSH(Python的对象共享):http://poshmodule.sourceforge.net/

1

这可能是最简单直接创建mp.Process ES(不含mp.Pool):

import multiprocessing as mp 
import time 

class Connection(object): 
    def __init__(self,name): 
     self.name=name 
    def __str__(self): 
     return self.name 

def work(inqueue,conn): 
    name=mp.current_process().name 
    while 1: 
     data=inqueue.get() 
     time.sleep(.5) 
     print('{n}: work done with connection {c} on data {d}'.format(
      n=name,c=conn,d=data)) 
     inqueue.task_done() 

if __name__ == '__main__': 
    N=4 
    procs=[] 
    inqueue=mp.JoinableQueue() 
    for i in range(N): 
     conn=Connection(name='Conn-'+str(i)) 
     proc=mp.Process(target=work,name='Proc-'+str(i),args=(inqueue,conn)) 
     proc.daemon=True 
     proc.start() 

    datas = range(1,11) 
    for data in datas: 
     inqueue.put(data) 
    inqueue.join() 

产生

Proc-0: work done with connection Conn-0 on data 1 
Proc-1: work done with connection Conn-1 on data 2 
Proc-3: work done with connection Conn-3 on data 3 
Proc-2: work done with connection Conn-2 on data 4 
Proc-0: work done with connection Conn-0 on data 5 
Proc-1: work done with connection Conn-1 on data 6 
Proc-3: work done with connection Conn-3 on data 7 
Proc-2: work done with connection Conn-2 on data 8 
Proc-0: work done with connection Conn-0 on data 9 
Proc-1: work done with connection Conn-1 on data 10 

注意Proc每次对应的编号相同Conn