0

我有一个python脚本正在运行,它在多个线程中启动相同的函数。这些函数创建并处理2个计数器(c1和c2)。来自分叉进程的所有c1计数器的结果应该合并在一起。与所有c2计数器的结果相同,由不同的叉子返回。在多处理/映射函数中返回计数器对象

我的(伪)码的样子说:

def countIt(cfg) 
    c1 = Counter 
    c2 = Counter 
    #do some things and fill the counters by counting words in an text, like 
    #c1= Counter({'apple': 3, 'banana': 0}) 
    #c2= Counter({'blue': 3, 'green': 0})  

    return c1, c2 

if __name__ == '__main__': 
     cP1 = Counter() 
     cP2 = Counter() 
     cfg = "myConfig" 
     p = multiprocessing.Pool(4) #creating 4 forks 
     c1, c2 = p.map(countIt,cfg)[:2] 
     # 1.) This will only work with [:2] which seams to be no good idea 
     # 2.) at this point c1 and c2 are lists, not a counter anymore, 
     # so the following will not work: 
     cP1 + c1 
     cP2 + c2 

按照上面的例子中,我需要一个像结果: CP1 =计数器({ '苹果':25, '香蕉':247, 'orange':24}) cP2 = Counter({'red':11,'blue':56,'green':3})

所以我的问题:我该如何计算事物洞察分叉过程为了汇总父进程中的每个计数器(全部是c1和全部c2)?

+0

@mattm这是行不通的,因为'总和()'不会返回柜台?以下错误发生:'TypeError:不支持的操作数类型为+:'int'和'Counter'' –

+1

至少这行肯定是一个错误:'c1,c2 = p.map(countIt,cfg)[ :2]'。你可以看到如何处理swenzel的答案的结果。 – KobeJohn

回答

2

您需要使用例如for-each循环来“解压缩”您的结果。您将收到一个元组列表,其中每个元组都是一对计数器:(c1, c2)
你现在的解决方案实际上是混合起来的。您将[(c1a, c2a), (c1b, c2b)]指定为c1, c2,这意味着c1包含(c1a, c2a)c2包含(c1b, c2b)

试试这个:

if __name__ == '__main__': 
     from contextlib import closing 

     cP1 = Counter() 
     cP2 = Counter() 

     # I hope you have an actual list of configs here, otherwise map will 
     # will call `countIt` with the single characters of the string 'myConfig' 
     cfg = "myConfig" 

     # `contextlib.closing` makes sure the pool is closed after we're done. 
     # In python3, Pool is itself a contextmanager and you don't need to 
     # surround it with `closing` in order to be able to use it in the `with` 
     # construct. 
     # This approach, however, is compatible with both python2 and python3. 
     with closing(multiprocessing.Pool(4)) as p: 
      # Just counting, no need to order the results. 
      # This might actually be a bit faster. 
      for c1, c2 in p.imap_unordered(countIt, cfg): 
       cP1 += c1 
       cP2 += c2 
+0

不是OP,但是感谢使用关闭上下文管理器来改进代码。我之前没有看到它,也许是因为我还没有在py3中使用mp。 – KobeJohn

+0

谢谢,这是有效的。在我发现之前的几分钟,python会建立一个所有分支结果的列表,比如'[(counter(),counter()),(counter(),counter()),....]'' 。所以你的回答恰好符合这一点。谢谢。使用'closing'是绝对新的,但很有趣! :-) –