TL;博士:在map-3
问题,其中第一func
被调用,当人们预期第二func
是是由于Pool.map()
与泡菜连载func.__name__
的事实,即使它被解析为func
分配给func2
参考,并发送给子进程,子进程在本地查找子进程func
。
行,所以我可以算四个不同的问题,下面列出的,我认为你已经演讲关于命名空间和分叉过程,直进入你的问题的乐趣☺
① But how does map-3 find the name func2?
② So func2 = func is indeed executed, while def func(i): print('second') is not. Why?
③ Then map-3 won't find name func2 as mentioned by many posts, eg. this one. What's the difference between two cases?
④ As I understand the arguments are passed to the slave processes by pickling, but how does pool pass the called function to other processes? Or how do sub-processes find the called function?
所以我增加更多的代码,来展示更多的内部的:
import os
from multiprocessing import Pool
print(os.getpid(), 'parent')
def func(i):
print(os.getpid(), 'first', end=" | ")
if 'func' in globals():
print(globals()['func'], end=" | ")
else:
print("no func in globals", end=" | ")
if 'func2' in globals():
print(globals()['func2'])
else:
print("no func2 in globals")
print('------ map-1')
pool1 = Pool(2)
pool1.map(func, range(2)) #map-1
def func(i):
print(os.getpid(), 'second', end=" | ")
if 'func' in globals():
print(globals()['func'], end=" | ")
else:
print("no func in globals", end=" | ")
if 'func2' in globals():
print(globals()['func2'])
else:
print("no func2 in globals")
func2 = func
print('------ map-2')
pool1.map(func, range(2)) #map-2
print('------ map-3')
pool1.map(func2, range(2)) #map-3
pool2 = Pool(2)
print('------ map-4')
pool2.map(func, range(2)) #map-4
print('------ map-5')
pool2.map(func2, range(2)) #map-5
这outpu我的系统上TS:
21512 parent
------ map-1
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-2
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-3
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-4
21518 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
21519 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
------ map-5
21518 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
21519 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
所以,我们可以看到,对于pool1
有从未被添加到该命名空间的func2
。所以这里肯定会有些腥意,现在对我来说已经太晚了,无法彻底查看multiprocessing
的来源以及调试器来了解发生了什么。
所以,如果我猜的答案①,pickle
模块莫名其妙发现自己的func2
解析为0x7f62d531bed8
,它已经与标签func
存在,因此泡菜对孩子边的已知“标记” func
,在那里解决到0x7f62d67f7cf8
。即:
func2 → 0x7f62d531bed8 → func → [PICKLE] → globals()['func'] → 0x7f62d67f7cf8
为了测试我的理论,我改变了你的代码位,通过重命名第二func()
为func2()
,这里是我得到了什么:
------ map-3
Process PoolWorker-1:
Process PoolWorker-2:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
return recv()
AttributeError: 'module' object has no attribute 'func2'
AttributeError: 'module' object has no attribute 'func2'
,然后改变以及func = func2
到func2 = func
------ map-2
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process PoolWorker-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
return recv()
AttributeError: 'module' object has no attribute 'func2'
AttributeError: 'module' object has no attribute 'func2'
所以我相信我已经开始提出一个观点了。此外,它还显示了在哪儿阅读代码,以了解儿童进程方面发生了什么。
让更多的线索来回答②和③!
为了进一步得到,我加内pool.py
线114 print语句:
job, i, func, args, kwds = task
print("XXX", os.getpid(), job, i, func, args, kwds)
显示发生了什么事情。而且我们可以看到,func
决心0x7f2d0238fcf8
,这是相同的地址父函数中:
23432 parent
------ map-1
('XXX', 23433, 0, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {})
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
('XXX', 23434, 0, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {})
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
------ map-2
('XXX', 23433, 1, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {})
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
('XXX', 23434, 1, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {})
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
------ map-3
('XXX', 23433, 2, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {})
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
('XXX', 23434, 2, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {})
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
------ map-4
('XXX', 23438, 3, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (0,)),), {})
23438 second | <function func at 0x1092e60> | <function func at 0x1092e60>
('XXX', 23439, 3, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (1,)),), {})
23439 second | <function func at 0x1092e60> | <function func at 0x1092e60>
------ map-5
('XXX', 23438, 4, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (0,)),), {})
('XXX', 23439, 4, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (1,)),), {})
23438 second | <function func at 0x1092e60> | <function func at 0x1092e60>
23439 second | <function func at 0x1092e60> | <function func at 0x1092e60>
所以回答④,我们就需要在多源进一步深入,甚至可能范围内泡菜来源。
但我想我对分辨率的感觉很可能是正确的... 然后剩下的唯一问题是,为什么不再次解析标签地址和回标签,推,要孩子的过程之前!
编辑:我想我知道为什么!当我准备睡觉的时候,原因就出现在我的脑海中,所以我回到了我的键盘:
酸洗功能时,pickles接受包含该功能的参数,并从该功能的对象本身获取其名称:
所以即使如果你创建一个新的函数对象,为此,你得到内存中的不同地址:
>>> print(func)
<function func at 0x7fc6174e3ed8>
泡菜并不关心,因为如果该功能尚未由可访问孩子,它永远不会被访问。所以泡菜只解决func.__name__
:
>>> print("func.__name__:", func.__name__)
func.__name__: func
>>> print("func2.__name__:", func2.__name__)
func2.__name__: func
然后,即使更改父线程函数体,你做了一个新的参照该功能,真正被腌制是函数的内部名称,当lambda被赋值或函数被定义时给出。
这解释了为什么当您在阶段给func2
到pool1
时,您会得到旧的func
函数。
所以,作为一个结论,对于①map-3
没有找到名字func2
,它找到func2
引用的函数中的名称func
。所以,那也回答②&③,因为func
正在执行原来的func
函数。这个机制,是func.__name__
被用来腌制和解决两个进程之间的函数名,回答④。
最后更新,从你:
在pickle._Pickler.save_global
,它就会使用
if name is None: name = getattr(obj, '__qualname__', None)
然后再次
if name is None: name = obj.__name__.
因此,如果OBJ具有名称没有__qualname__
然后__name__
将会被使用。
However it will check if the object passed is same with the one in subprocess:
if obj2 is not obj: raise PicklingError(...)
其中obj2, parent = _getattribute(module, name)
。
是的,但请记住传递的对象只是函数的(内部)名称,而不是函数本身。子进程有否找出他的func()
是否与父内存中的func()
相同。从@SyrtisMajor
编辑:
OK,让我们改变上面的第一个代码:
import os
from multiprocessing import Pool
print(os.getpid(), 'parent')
def func(i):
print(os.getpid(), 'first', end=" | ")
if 'func' in globals():
print(globals()['func'], end=" | ")
else:
print("no func in globals", end=" | ")
if 'func2' in globals():
print(globals()['func2'])
else:
print("no func2 in globals")
print('------ map-1')
pool1 = Pool(2)
pool1.map(func, range(2)) #map-1
def func2(i):
print(os.getpid(), 'second', end=" | ")
if 'func' in globals():
print(globals()['func'], end=" | ")
else:
print("no func in globals", end=" | ")
if 'func2' in globals():
print(globals()['func2'])
else:
print("no func2 in globals")
func2.__qualname__ = func.__qualname__
func = func2
print('------ map-2')
pool1.map(func, range(2)) #map-2
print('------ map-3')
pool1.map(func2, range(2)) #map-3
pool2 = Pool(2)
print('------ map-4')
pool2.map(func, range(2)) #map-4
print('------ map-5')
pool2.map(func2, range(2)) #map-5
输出情况如下:
38130 parent
------ map-1
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-2
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-3
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-4
38133 second | <function func at 0x10339b510> | <function func at 0x10339b510>
38134 second | <function func at 0x10339b510> | <function func at 0x10339b510>
------ map-5
38133 second | <function func at 0x10339b510> | <function func at 0x10339b510>
38134 second | <function func at 0x10339b510> | <function func at 0x10339b510>
这是完全一样的我们的第一个输出并注意func2
定义后的func = func2
是关键,因为腌渍将检查func2
(名称func
)是否与__main__.func
相同。如果不是,酸洗将会失败。
N.B .:现在已经快5点了,现在我太累了,不能继续挖掘这个非常有趣的问题! – zmo
干得好!这是有道理的,并解释我所有的问题。谢谢,我从中学到了很多东西。我会(尝试)稍后检查多处理代码。 –
啊哈,很有意思。它似乎是'__qualname__'。我定义了一个新的'func2'并为其分配'func .__ qualname__'。然后traceback说'_pickle.PicklingError:不能在0x1234abcde> pickle