2014-11-01 323 views
5

我正在使用Python多处理池模块创建一个进程池并为其分配作业。Python多处理进程号

我已经创建了4个过程和分配2个工作,但试图显示其进程数目,但在显示我只看到一个进程号“6952” ......难道不应该打印2个进程号

from multiprocessing import Pool 
from time import sleep 

def f(x): 
    import os 
    print "process id = " , os.getpid() 
    return x*x 

if __name__ == '__main__': 
    pool = Pool(processes=4)    # start 4 worker processes 

    result = pool.map_async(f, (11,)) #Start job 1 
    result1 = pool.map_async(f, (10,)) #Start job 2 
    print "result = ", result.get(timeout=1) 
    print "result1 = ", result1.get(timeout=1) 

结果: -

result = process id = 6952 
process id = 6952 
[121] 
result1 = [100] 
+0

你使用的是Windows? – dano 2014-11-01 22:28:25

+0

@ dano-yes ....., – user1050619 2014-11-01 22:29:14

回答

2

它只是时机。 Windows需要在Pool中产生4个进程,然后需要启动,初始化并准备从Queue消耗。在Windows上,这要求每个子进程重新导入__main__模块,并要求Pool内部使用的Queue实例在每个子进程中取消选中。这需要花费很少的时间。事实上,当你在Pool的所有进程运行起来之前都执行了两个map_async()调用时,已经足够长了。你可以看到这一点,如果你在添加每个工人运行一些跟踪功能的Pool

while maxtasks is None or (maxtasks and completed < maxtasks): 
    try: 
     print("getting {}".format(current_process())) 
     task = get() # This is getting the task from the parent process 
     print("got {}".format(current_process())) 

输出:

getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> 
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> 
process id = 5145 
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> 
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> 
process id = 5145 
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> 
result = [121] 
result1 = [100] 
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> 
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)> 
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)> 
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> 

正如你所看到的,Worker-1启动和工人之前消耗这两个任务2-4尝试从Queue消耗。如果添加sleep叫你实例化的主要工序中Pool之后,但调用map_async之前,你会看到不同的流程处理每一个请求:

getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> 
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> 
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)> 
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)> 
# <sleeping here> 
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> 
process id = 5183 
got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> 
process id = 5184 
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> 
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> 
result = [121] 
result1 = [100] 
got <ForkServerProcess(ForkServerPoolWorker-3, started daemon)> 
got <ForkServerProcess(ForkServerPoolWorker-4, started daemon)> 
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> 
got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> 

(请注意,附加"getting/"got"语句你看到的是哨兵被发送到每个进程优雅地关闭它们)。

在Linux上使用Python 3.x,我可以使用'spawn''forkserver'上下文重现此行为,但不能使用'fork'。大概是因为分娩孩子的过程比产卵他们并且重新进口__main__要快得多。

0

它打印2个进程ID。

result = process id = 6952 <=== process id = 6952 
process id = 6952 <=== process id = 6952 
[121] 
result1 = [100] 

这是因为您的工作进程很快完成并准备处理另一个请求。

result = pool.map_async(f, (11,)) #Start job 1 
result1 = pool.map_async(f, (10,)) #Start job 2 

在上面的代码中,您的工作人员完成了工作并返回到池中,并准备完成作业2.这可能有多种原因。最常见的是工人很忙,或者没有准备好。

下面是一个例子,我们将有4名工人,但只有其中一人即将准备就绪。因此,我们知道哪一个将要完成这项工作。

# https://gist.github.com/dnozay/b2462798ca89fbbf0bf4 

from multiprocessing import Pool,Queue 
from time import sleep 

def f(x): 
    import os 
    print "process id = " , os.getpid() 
    return x*x 

# Queue that will hold amount of time to sleep 
# for each worker in the initialization 
sleeptimes = Queue() 
for times in [2,3,0,2]: 
    sleeptimes.put(times) 

# each worker will do the following init. 
# before they are handed any task. 
# in our case the 3rd worker won't sleep 
# and get all the work. 
def slowstart(q): 
    import os 
    num = q.get() 
    print "slowstart: process id = {0} (sleep({1}))".format(os.getpid(),num) 
    sleep(num) 

if __name__ == '__main__': 
    pool = Pool(processes=4,initializer=slowstart,initargs=(sleeptimes,)) # start 4 worker processes 
    result = pool.map_async(f, (11,)) #Start job 1 
    result1 = pool.map_async(f, (10,)) #Start job 2 
    print "result = ", result.get(timeout=3) 
    print "result1 = ", result1.get(timeout=3) 

例如:

$ python main.py 
slowstart: process id = 97687 (sleep(2)) 
slowstart: process id = 97688 (sleep(3)) 
slowstart: process id = 97689 (sleep(0)) 
slowstart: process id = 97690 (sleep(2)) 
process id = 97689 
process id = 97689 
result = [121] 
result1 = [100] 
+0

gist at https://gist.github.com/dnozay/b2462798ca89fbbf0bf4 – dnozay 2014-11-01 23:53:38