我正在构建一个python模块以从大型文本语料库中提取标签,虽然其结果质量高,但执行速度非常缓慢。我试图通过使用多处理来加速进程,并且这也一直在起作用,直到我试图引入一个锁,以便一次只有一个进程连接到我们的数据库。我不知道如何做这项工作的生活 - 尽管进行了大量的搜索和调整,我仍然得到了PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed
。这是有问题的代码 - 它工作正常,直到我试图传递一个锁对象作为参数为f
。使用多处理锁定时遇到问题:酸洗错误
def make_network(initial_tag, max_tags = 2, max_iter = 3):
manager = Manager()
lock = manager.Lock()
pool = manager.Pool(8)
# this is a very expensive function that I would like to parallelize
# over a list of tags. It involves a (relatively cheap) call to an external
# database, which needs a lock to avoid simultaneous queries. It takes a list
# of strings (tags) as its sole argument, and returns a list of sets with entries
# corresponding to the input list.
f = partial(get_more_tags, max_tags = max_tags, lock = lock)
def _recursively_find_more_tags(tags, level):
if level >= max_iter:
raise StopIteration
new_tags = pool.map(f, tags)
to_search = []
for i, s in zip(tags, new_tags):
for t in s:
joined = ' '.join(t)
print i + "|" + joined
to_search.append(joined)
try:
return _recursively_find_more_tags(to_search, level+1)
except StopIteration:
return None
_recursively_find_more_tags([initial_tag], 0)
您是在Windows还是在Linux上运行? – Jonathan
我在Linux上,对不起,我忘了补充一点! – sbrother