2016-03-09 22 views
3

我们知道multiprocessing.Pool必须在函数定义之后对它们进行初始化。但是我发现下面的代码是高深莫测,我Python多处理池在创建时使用命名空间进行交互

import os 
from multiprocessing import Pool 

def func(i): print('first') 

pool1 = Pool(2) 
pool1.map(func, range(2))   #map-1 

def func(i): print('second') 
func2 = func 

print('------') 
pool1.map(func, range(2))  #map-2 
pool1.map(func2, range(2))  #map-3 

pool2 = Pool(2) 
print('------') 
pool2.map(func, range(2))  #map-4 
pool2.map(func2, range(2))  #map-5 

输出(python2.7和python3.4在Linux上)是

first   #map-1 
first 
------ 
first   #map-2 
first 
first   #map-3 
first 
------ 
second  #map-4 
second 
second  #map-5 
second 

map-2打印'first'正如我们的预期。 但map-3如何找到名称func2?我的意思是pool1func2的第一次出现之前被初始化。所以func2 = func确实执行,而def func(i): print('second')不是。为什么?

如果我直接

def func2(i): print('second') 

定义FUNC2然后由许多职位,如提及map-3不会找到名称func2this one。两种情况有什么区别?

据我所知,参数通过酸洗传递给从属进程,但 pool如何将被调用函数传递给其他进程?或者子流程如何找到被调用的函数?

回答

1

TL;博士:在map-3问题,其中第一func被调用,当人们预期第二func是是由于Pool.map()与泡菜连载func.__name__的事实,即使它被解析为func分配给func2参考,并发送给子进程,子进程在本地查找子进程func



行,所以我可以算四个不同的问题,下面列出的,我认为你已经演讲关于命名空间和分叉过程,直进入你的问题的乐趣☺

① But how does map-3 find the name func2?

② So func2 = func is indeed executed, while def func(i): print('second') is not. Why?

③ Then map-3 won't find name func2 as mentioned by many posts, eg. this one. What's the difference between two cases?

④ As I understand the arguments are passed to the slave processes by pickling, but how does pool pass the called function to other processes? Or how do sub-processes find the called function?

所以我增加更多的代码,来展示更多的内部的:

import os 
from multiprocessing import Pool 

print(os.getpid(), 'parent') 

def func(i): 
    print(os.getpid(), 'first', end=" | ") 
    if 'func' in globals(): 
     print(globals()['func'], end=" | ") 
    else: 
     print("no func in globals", end=" | ") 
    if 'func2' in globals(): 
     print(globals()['func2']) 
    else: 
     print("no func2 in globals") 

print('------ map-1') 
pool1 = Pool(2) 
pool1.map(func, range(2))   #map-1 

def func(i): 
    print(os.getpid(), 'second', end=" | ") 
    if 'func' in globals(): 
     print(globals()['func'], end=" | ") 
    else: 
     print("no func in globals", end=" | ") 
    if 'func2' in globals(): 
     print(globals()['func2']) 
    else: 
     print("no func2 in globals") 
func2 = func 

print('------ map-2') 
pool1.map(func, range(2))  #map-2 
print('------ map-3') 
pool1.map(func2, range(2))  #map-3 

pool2 = Pool(2) 
print('------ map-4') 
pool2.map(func, range(2))  #map-4 
print('------ map-5') 
pool2.map(func2, range(2))  #map-5 

这outpu我的系统上TS:

21512 parent 
------ map-1 
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals 
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals 
------ map-2 
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals 
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals 
------ map-3 
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals 
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals 
------ map-4 
21518 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8> 
21519 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8> 
------ map-5 
21518 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8> 
21519 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8> 

所以,我们可以看到,对于pool1有从未被添加到该命名空间的func2。所以这里肯定会有些腥意,现在对我来说已经太晚了,无法彻底查看multiprocessing的来源以及调试器来了解发生了什么。

所以,如果我猜的答案①,pickle模块莫名其妙发现自己的func2解析为0x7f62d531bed8,它已经与标签func存在,因此泡菜对孩子边的已知“标记” func ,在那里解决到0x7f62d67f7cf8。即:

func2 → 0x7f62d531bed8 → func → [PICKLE] → globals()['func'] → 0x7f62d67f7cf8 

为了测试我的理论,我改变了你的代码位,通过重命名第二func()func2(),这里是我得到了什么:

------ map-3 
Process PoolWorker-1: 
Process PoolWorker-2: 
Traceback (most recent call last): 
Traceback (most recent call last): 
    File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap 
    File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap 
    self.run() 
    self.run() 
    File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run 
    File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run 
    self._target(*self._args, **self._kwargs) 
    self._target(*self._args, **self._kwargs) 
    File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker 
    File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker 
    task = get() 
    task = get() 
    File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get 
    File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get 
    return recv() 
    return recv() 
AttributeError: 'module' object has no attribute 'func2' 
AttributeError: 'module' object has no attribute 'func2' 

,然后改变以及func = func2func2 = func

------ map-2 
Process PoolWorker-1: 
Traceback (most recent call last): 
    File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap 
Process PoolWorker-2: 
Traceback (most recent call last): 
    File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap 
    self.run() 
    self.run() 
    File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run 
    File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run 
    self._target(*self._args, **self._kwargs) 
    self._target(*self._args, **self._kwargs) 
    File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker 
    File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker 
    task = get() 
    task = get() 
    File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get 
    File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get 
    return recv() 
    return recv() 
AttributeError: 'module' object has no attribute 'func2' 
AttributeError: 'module' object has no attribute 'func2' 

所以我相信我已经开始提出一个观点了。此外,它还显示了在哪儿阅读代码,以了解儿童进程方面发生了什么。

让更多的线索来回答②和③!

为了进一步得到,我加内pool.py线114 print语句:

job, i, func, args, kwds = task 
    print("XXX", os.getpid(), job, i, func, args, kwds) 

显示发生了什么事情。而且我们可以看到,func决心0x7f2d0238fcf8,这是相同的地址父函数中:

23432 parent 
------ map-1 
('XXX', 23433, 0, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {}) 
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals 
('XXX', 23434, 0, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {}) 
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals 
------ map-2 
('XXX', 23433, 1, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {}) 
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals 
('XXX', 23434, 1, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {}) 
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals 
------ map-3 
('XXX', 23433, 2, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {}) 
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals 
('XXX', 23434, 2, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {}) 
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals 
------ map-4 
('XXX', 23438, 3, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (0,)),), {}) 
23438 second | <function func at 0x1092e60> | <function func at 0x1092e60> 
('XXX', 23439, 3, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (1,)),), {}) 
23439 second | <function func at 0x1092e60> | <function func at 0x1092e60> 
------ map-5 
('XXX', 23438, 4, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (0,)),), {}) 
('XXX', 23439, 4, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (1,)),), {}) 
23438 second | <function func at 0x1092e60> | <function func at 0x1092e60> 
23439 second | <function func at 0x1092e60> | <function func at 0x1092e60> 

所以回答④,我们就需要在多源进一步深入,甚至可能范围内泡菜来源。

但我想我对分辨率的感觉很可能是正确的... 然后剩下的唯一问题是,为什么再次解析标签地址和回标签,推,要孩子的过程之前!


编辑:我想我知道为什么!当我准备睡觉的时候,原因就出现在我的脑海中,所以我回到了我的键盘:

酸洗功能时,pickles接受包含该功能的参数,并从该功能的对象本身获取其名称:

所以即使如果你创建一个新的函数对象,为此,你得到内存中的不同地址:

>>> print(func) 
<function func at 0x7fc6174e3ed8> 

泡菜并不关心,因为如果该功能尚未由可访问孩子,它永远不会被访问。所以泡菜只解决func.__name__

>>> print("func.__name__:", func.__name__) 
func.__name__: func 
>>> print("func2.__name__:", func2.__name__) 
func2.__name__: func 

然后,即使更改父线程函数体,你做了一个新的参照该功能,真正被腌制是函数的内部名称,当lambda被赋值或函数被定义时给出。

这解释了为什么当您在阶段给func2pool1时,您会得到旧的func函数。

所以,作为一个结论,对于①map-3没有找到名字func2,它找到func2引用的函数中的名称func。所以,那也回答②&③,因为func正在执行原来的func函数。这个机制,是func.__name__被用来腌制和解决两个进程之间的函数名,回答④。


最后更新,从你:

pickle._Pickler.save_global,它就会使用

if name is None: name = getattr(obj, '__qualname__', None) 

然后再次

if name is None: name = obj.__name__. 

因此,如果OBJ具有名称没有__qualname__然后__name__将会被使用。

However it will check if the object passed is same with the one in subprocess:

if obj2 is not obj: raise PicklingError(...) 

其中obj2, parent = _getattribute(module, name)

是的,但请记住传递的对象只是函数的(内部)名称,而不是函数本身。子进程有找出他的func()是否与父内存中的func()相同。从@SyrtisMajor


编辑:

OK,让我们改变上面的第一个代码:

import os 
from multiprocessing import Pool 

print(os.getpid(), 'parent') 

def func(i): 
    print(os.getpid(), 'first', end=" | ") 
    if 'func' in globals(): 
     print(globals()['func'], end=" | ") 
    else: 
     print("no func in globals", end=" | ") 
    if 'func2' in globals(): 
     print(globals()['func2']) 
    else: 
     print("no func2 in globals") 

print('------ map-1') 
pool1 = Pool(2) 
pool1.map(func, range(2))   #map-1 

def func2(i): 
    print(os.getpid(), 'second', end=" | ") 
    if 'func' in globals(): 
     print(globals()['func'], end=" | ") 
    else: 
     print("no func in globals", end=" | ") 
    if 'func2' in globals(): 
     print(globals()['func2']) 
    else: 
     print("no func2 in globals") 

func2.__qualname__ = func.__qualname__ 

func = func2 

print('------ map-2') 
pool1.map(func, range(2))  #map-2 
print('------ map-3') 
pool1.map(func2, range(2))  #map-3 

pool2 = Pool(2) 
print('------ map-4') 
pool2.map(func, range(2))  #map-4 
print('------ map-5') 
pool2.map(func2, range(2))  #map-5 

输出情况如下:

38130 parent 
------ map-1 
38131 first | <function func at 0x101856f28> | no func2 in globals 
38132 first | <function func at 0x101856f28> | no func2 in globals 
------ map-2 
38131 first | <function func at 0x101856f28> | no func2 in globals 
38132 first | <function func at 0x101856f28> | no func2 in globals 
------ map-3 
38131 first | <function func at 0x101856f28> | no func2 in globals 
38132 first | <function func at 0x101856f28> | no func2 in globals 
------ map-4 
38133 second | <function func at 0x10339b510> | <function func at 0x10339b510> 
38134 second | <function func at 0x10339b510> | <function func at 0x10339b510> 
------ map-5 
38133 second | <function func at 0x10339b510> | <function func at 0x10339b510> 
38134 second | <function func at 0x10339b510> | <function func at 0x10339b510> 

这是完全一样的我们的第一个输出并注意func2定义后的func = func2是关键,因为腌渍将检查func2(名称func)是否与__main__.func相同。如果不是,酸洗将会失败。

+1

N.B .:现在已经快5点了,现在我太累了,不能继续挖掘这个非常有趣的问题! – zmo

+0

干得好!这是有道理的,并解释我所有的问题。谢谢,我从中学到了很多东西。我会(尝试)稍后检查多处理代码。 –

+0

啊哈,很有意思。它似乎是'__qualname__'。我定义了一个新的'func2'并为其分配'func .__ qualname__'。然后traceback说'_pickle.PicklingError:不能在0x1234abcde> pickle