在多处理中共享项目是否具有内存限制？

我正在测试一些代码（试图让它更快，但也试图了解差异）。我有一个循环在内存中创建一个表。然后我试图对其进行多处理，但是当我多处理时，内存使用看起来很奇怪。当我自己运行它时，表格会不断增长并不断增长，直到它占用系统上的所有内存，但是当我使用多处理时，它始终保持较低水平，这让我怀疑它在做什么。我试图快速重新创建非多处理代码。在多处理中共享项目是否具有内存限制？

下面是一些代码（只需添加/删除数据变量项，使其运行速度更快或更慢，看系统处理多道是在顶部和nonmulti是在底部。）：

from multiprocessing import Pool 
from multiprocessing.managers import BaseManager, DictProxy 
from collections import defaultdict 

class MyManager(BaseManager): 
    pass 

MyManager.register('defaultdict', defaultdict, DictProxy) 

def test(i,x, T): 
    target_sum = 1000 
    # T[x, i] is True if 'x' can be solved 
    # by a linear combination of data[:i+1] 
    #T = defaultdict(bool)   # all values are False by default 
    T[0, 0] = True    # base case 

    for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself 
      #print s 
      for c in range(s/x + 1): 
       if T[s - c * x, i]: 
        T[s, i + 1] = True 


data = [2,5,8,10,12,50]     
pool = Pool(processes=2) 
mgr = MyManager() 
mgr.start() 
T = mgr.defaultdict(bool) 
T[0, 0] = True 
for i, x in enumerate(data): # i is index, x is data[i] 
    pool.apply_async(test, (i,x, T)) 
pool.close() 
pool.join() 
pool.terminate() 


print 'size of Table(with multiprocesing) is:', len(T) 
count_of_true = [] 
for x in T.items(): 
    if T[x] == True: 
     count_of_true.append(x) 
print 'total number of true(with multiprocesing) is ', len(count_of_true) 


#now lets try without multiprocessing 
target_sum = 100 
# T[x, i] is True if 'x' can be solved 
# by a linear combination of data[:i+1] 
T1 = defaultdict(bool)   # all values are False by default 
T1[0, 0] = True    # base case 


for i, x in enumerate(data): # i is index, x is data[i] 
    for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself 
      for c in range(s/x + 1): 
       if T1[s - c * x, i]: 
        T1[s, i + 1] = True 

print 'size of Table(without multiprocesing) is ', len(T1) 

count = [] 
for x in T1: 
    if T1[x] == True: 
     count.append(x) 

print 'total number of true(without multiprocessing) is ', len(count)

作为一个实验，我将两段代码放到两个文件中并且并排运行。两个multi需要大约20％，每个只使用0.5％的内存。单进程（不带多进程）使用75％的内核和高达50％的内存使用量。

来源

2012-02-21 Lostsoul

你写道：“当我自己运行它时......”你是否谈过设置Pool（processes = 1）？ – itsafire 2012-02-21 15:21:29

不完全。在我上面的代码中，我有两个部分，一个包装在多进程池中，另一个运行在自己的池中（没有池）。 – Lostsoul 2012-02-21 15:23:03

如果我理解你的代码，那么真正的问题是你不能用多处理来建立你的查询表。

此：

for i, x in enumerate(data): 
    for s in range(target_sum + 1): 
     for c in range(s/x + 1): 
      if T1[s - c * x, i]: 
       T1[s, i + 1] = True

的作品，因为你在做其他的后itone一步。

这枚：

def test(i,x, T): 
    target_sum = 1000 
    T[0, 0] = True 
    for s in range(target_sum + 1): 
     for c in range(s/x + 1): 
      if T[s - c * x, i]: 
       T[s, i + 1] = True 

# [...] 

for i, x in enumerate(data): 
    pool.apply_async(test, (i,x, T))

不会做同样的事情，怎么一回事，因为你需要，以建立新的，在RecursivelyListAllThatWork()以前的结果。

还有你计数的错误，这样的：

for x in T.items(): 
    if T[x] == True: 
     count_of_true.append(x)

应该是：

for x in T: 
    if T[x] == True: 
     count_of_true.append(x)

而且最好用is与==比较True没有，即使你的情况，你不不需要：

for x in T: 
    if T[x]: 
     count_of_true.append(x)

另请注意，您实际上并不需要defaultdict，因为I和others已告诉您。

来源

2012-02-21 15:20:59

，但即使进程= 1，它也不会像内存那样最大化内存，因为它完全没有多处理。我注意到与我创建的另一个多进程相同的事情（单进程会增加内存，而多进程似乎不会随着工作量的增加而增长）。 – Lostsoul 2012-02-21 15:25:47

@Lostsoul：我认为不重要。问题在于，你正在用for循环遍历所有表格。我已经运行了你的代码，多处理表几乎比没有处理表大10倍。关于内存使用情况，可能是因为您正在使用一个共享表而不是多个表，但这里不是这种情况。 – 2012-02-21 15:36:13

但是不应该多处理仍然有一个共享表（如没有多个），还是它在它自己的表上工作，然后将它同步到主或什么？对不起，我只是不了解内存差异。另外，我读了你最新的答案，谢谢你的提示（你帮助我变得更好）。 – Lostsoul 2012-02-21 15:52:29

在多处理中共享项目是否具有内存限制？

回答

相关问题