2012-09-16 22 views
0

要了解更多关于python中的进程,我决定使用多个进程来实现quicksort。它似乎在少于21K元素的列表上工作得很好,但它只是挂起了一些更大的元素(比如25K元素)。python中的多进程quicksort挂起大量元素

我试过了import sys; sys.setrecursionlimit(2**30)诀窍无济于事。这是因为python不喜欢真正深度的递归吗?有趣的是,相同的算法对大约50k个元素的单个处理效果很好。

import random, multiprocessing, time 

DESIRED_PROC_DEPTH = 2 
proc_depth = 1 #start at one to count the parent process that kicks this all off 

def main(): 
    import sys; sys.setrecursionlimit(2**30) 
    num_elements = 20000 
    list_of_numbers = [random.randint(0,num_elements) for num in range(num_elements)] 
    # print list_of_numbers 
    simple_start = time.time() 
    simple_sorted = simple_qs(list_of_numbers) 
    simple_total_time = time.time() - simple_start 
    print "Using a single thread we sorted " + str(num_elements) + " elements in " + str(simple_total_time) + " seconds" 
    the_q = multiprocessing.Queue() 
    sorter = MTQuickSort(list_of_numbers, the_q) 
    start = time.time() 
    sorter.start() 
    sorter.join() 
    mt_total_time = time.time() - start 
    sorted_list = the_q.get() 
    # print sorted_list 
    print "Sorted " + str(num_elements) + " elements in " + str(mt_total_time) + " seconds" 


class MTQuickSort(multiprocessing.Process): 

    def __init__(self, list_to_sort, queue): 
     super(MTQuickSort, self).__init__() 
     print self.name 
     self.queue = queue 
     self.list = list_to_sort 

    def run(self): 
     self.queue.put(self.quicksort(self.list)) 

    def quicksort(self, list): 
     global proc_depth 
     global DESIRED_PROC_DEPTH 
     if len(list) is 0: 
      # base case is that list is empty 
      return [] 
     else: 
      pivot = list[0] 
      less = [x for x in list[1:] if x <= pivot] 
      greater = [x for x in list[1:] if x > pivot] 
      if proc_depth < DESIRED_PROC_DEPTH: 
       proc_depth += 1 
       # We create threads in blocks of two since we partition the list into two parts 
       lessor_q = multiprocessing.Queue() 
       greater_q = multiprocessing.Queue() 
       qs_process1 = MTQuickSort(less, lessor_q) 
       qs_process2 = MTQuickSort(greater, greater_q) 
       qs_process1.start() 
       qs_process2.start() 
       qs_process1.join() 
       qs_process2.join() 
       return lessor_q.get() + [pivot] + greater_q.get() 
      else: 
       less_than_pivot = self.quicksort(less) 
       greater_than_pivot = self.quicksort(greater) 
       return less_than_pivot + [pivot] + greater_than_pivot 

def simple_qs(list): 
    if len(list) is 0: 
     # base case is that list is empty 
     return [] 
    else: 
     pivot = list[0] 
     return simple_qs([x for x in list[1:] if x <= pivot]) + [pivot] + simple_qs([x for x in list[1:] if x > pivot]) 



if __name__ == '__main__': 
    main() 
+0

已确认(在Linux上):它通常适用于21900个元素并挂起21920个元素。为什么这么精确的数字它与sys.setrecursionlimit()无关,因为该算法仅递归10-20次。 –

回答

2

这是由于您在从队列中获取()结果之前连接()子过程所导致的。显然,队列并不急于从子进程读取数据。所以当你调用join()时,调用者的进程会等待,直到子进程结束,但是子进程本身正在等待尝试将数据发送到管道到其父进程。

如果我移动相应的get()之后的各种join(),它将起作用。

+0

+1的解释。谢谢! – AndrewJesaitis