用于计算特征值的多处理

我生成了大小为1000x1000的100个随机int矩阵。我正在使用多处理模块来计算100个矩阵的特征值。用于计算特征值的多处理

的代码如下：

import timeit 
import numpy as np 
import multiprocessing as mp 

def calEigen(): 

S, U = np.linalg.eigh(a) 

def multiprocess(processes): 
pool = mp.Pool(processes=processes) 
#Start timing here as I don't want to include time taken to initialize the processes 
start = timeit.default_timer() 
results = [pool.apply_async(calEigen, args=())] 
stop = timeit.default_timer() 
print (processes":", stop - start) 

results = [p.get() for p in results] 
results.sort() # to sort the results 


if __name__ == "__main__": 

global a 
a=[] 

for i in range(0,100): 
    a.append(np.random.randint(1,100,size=(1000,1000))) 

#Print execution time without multiprocessing 
start = timeit.default_timer() 
calEigen() 
stop = timeit.default_timer() 
print stop - start 

#With 1 process 
multiprocess(1) 

#With 2 processes 
multiprocess(2) 

#With 3 processes 
multiprocess(3) 

#With 4 processes 
multiprocess(4)

输出是

0.510247945786 
('Process:', 1, 5.1021575927734375e-05) 
('Process:', 2, 5.698204040527344e-05) 
('Process:', 3, 8.320808410644531e-05) 
('Process:', 4, 7.200241088867188e-05)

另一次迭代显示输出：

69.7296020985 
('Process:', 1, 0.0009050369262695312) 
('Process:', 2, 0.023727893829345703) 
('Process:', 3, 0.0003509521484375) 
('Process:', 4, 0.057518959045410156)

我的问题是：

为什么时间执行时间不会随着进程数量的增加而减少？我是否正确使用多处理模块？
我是否正确计算执行时间？

我编辑了下面评论中给出的代码。我想要串行和多处理函数来查找100个矩阵的相同列表的特征值。编辑后的代码是 -

import numpy as np 
import time 
from multiprocessing import Pool 

a=[] 

for i in range(0,100): 
a.append(np.random.randint(1,100,size=(1000,1000))) 

def serial(z): 
result = [] 
start_time = time.time() 
for i in range(0,100):  
    result.append(np.linalg.eigh(z[i])) #calculate eigen values and append to result list 
end_time = time.time() 
print("Single process took :", end_time - start_time, "seconds") 


def caleigen(c): 
result = []   
result.append(np.linalg.eigh(c)) #calculate eigenvalues and append to result list 
return result 

def mp(x): 
start_time = time.time() 
with Pool(processes=x) as pool: # start a pool of 4 workers 
    result = pool.map_async(caleigen,a) # distribute work to workers 
    result = result.get() # collect result from MapResult object 
end_time = time.time() 
print("Mutltiprocessing took:", end_time - start_time, "seconds") 

if __name__ == "__main__": 

serial(a) 
mp(1,a) 
mp(2,a) 
mp(3,a) 
mp(4,a)

随着进程数量的增加，时间不会减少。我哪里错了？多处理将列表划分为进程的块还是必须进行划分？

来源

2015-11-01 Misha

您没有分配工作并将其分发到您的流程。所以没有合作。它更像每个进程自己进行完整计算，而当有更多进程同时执行相同的事情时，由于CPU负载更多，而不是一个进程正在执行完整的计算。如果你分工并分发给你的工作人员，它应该更快。 – dopstar

您没有正确使用多处理模块。正如@dopstar指出的那样，你不会分裂你的任务。进程池只有一个任务，所以不管你分配了多少工人，只有一个人能够得到这份工作。至于你的第二个问题，我没有使用timeit来准确地测量处理时间。我只是使用time模块来粗略地了解事物的速度。尽管如此，它在大多数时间都是有用的。如果我理解你想要正确地做什么，这应该是你的代码

import numpy as np 
import time 

result = [] 
start_time = time.time() 
for i in range(100): 
    a = np.random.randint(1, 100, size=(1000,1000)) #generate random matrix 
    result.append(np.linalg.eigh(a))     #calculate eigen values and append to result list 
end_time = time.time() 
print("Single process took :", end_time - start_time, "seconds")

单流程版本了15.27秒我的电脑上的单进程版本。下面是多进程版本，我的电脑只花了0.46秒。我还包括单一的流程版本进行比较。（单进程版本也必须包含在if块中，并放在多进程版本之后。）因为您想重复计算100次，创建一个工作池并让它更容易他们会自动执行未完成的任务，而不是手动启动每个进程并指定每个进程应该执行的操作。在我的代码中，调用caleigen的参数只是为了跟踪任务执行的次数。最后，map_async通常比apply_async更快，它的缺点是消耗稍多的内存，并且只有一个参数用于函数调用。使用map_async而不是map的原因是，在这种情况下，返回结果的顺序无关紧要，map_async比map快得多。

from multiprocessing import Pool 
import numpy as np 
import time 

def caleigen(x):  # define work for each worker 
    a = np.random.randint(1,100,size=(1000,1000)) 
    S, U = np.linalg.eigh(a)       
    return S, U 


if __name__ == "main": 
    start_time = time.time() 
    with Pool(processes=4) as pool:  # start a pool of 4 workers 
     result = pool.map_async(caleigen, range(100)) # distribute work to workers 
     result = result.get()  # collect result from MapResult object 
    end_time = time.time() 
    print("Mutltiprocessing took:", end_time - start_time, "seconds") 

    # Run the single process version for comparison. This has to be within the if block as well. 
    result = [] 
    start_time = time.time() 
    for i in range(100): 
     a = np.random.randint(1, 100, size=(1000,1000)) #generate random matrix 
     result.append(np.linalg.eigh(a))     #calculate eigen values and append to result list 
    end_time = time.time() 
    print("Single process took :", end_time - start_time, "seconds")

来源

2015-11-03 01:18:55 user3667217

很好的回答;但是，多处理器不适用于我的多处理器。它需要0.s）代码在你的电脑上工作吗？ – jankos

是的，它在我的电脑上工作。这就是我得到这些时间的测量;）看起来'if __name__ ==“main”：'块没有在你的机器上运行。这很可能是因为你没有把它作为主流程运行。（这正是这条线正在测试的条件）。你是如何测试代码的？您应该将上述文本保存在.py文件中，然后运行该文件。 – user3667217

@ user3667217我编辑了你的代码，因为我想要串行和多处理来找到相同的100个矩阵的特征值。 – Misha

用于计算特征值的多处理

回答

相关问题