TL; DR:我不能得到最基本dispy
示例代码才能正常运行。为什么不?dispy示例程序挂起
细节:
我试图进入分布式处理的蟒蛇,并认为dispy库听起来很有意思,由于全面的功能集。
不过,我一直努力遵循的基本规范的程序例子,我越来越行不通。
- 我已经安装了dispy(
python -m pip install dispy
) - 我去到另一台机器上使用相同的子网地址和跑
python dispynode.py
。它似乎工作,因为我得到下面的输出:2016-06-14 10:33:38 dispynode - dispynode version 4.6.14
2016-06-14 10:33:38 asyncoro - version 4.1 with epoll I/O notifier
2016-06-14 10:33:38 dispynode - serving 8 cpus at 10.0.48.54:51348Enter "quit" or "exit" to terminate dispynode, "stop" to stop
service, "start" to restart service, "cpus" to change CPUs used,
anything else to get status: - 回到我的客户机上,我跑从http://dispy.sourceforge.net/_downloads/sample.py下载示例代码,复制在这里:
# function 'compute' is distributed and executed with arguments
# supplied with 'cluster.submit' below
def compute(n):
import time, socket
time.sleep(n)
host = socket.gethostname()
return (host, n)
if __name__ == '__main__':
# executed on client only; variables created below, including modules imported,
# are not available in job computations
import dispy, random
# distribute 'compute' to nodes; 'compute' does not have any dependencies (needed from client)
cluster = dispy.JobCluster(compute)
# run 'compute' with 20 random numbers on available CPUs
jobs = []
for i in range(20):
job = cluster.submit(random.randint(5,20))
job.id = i # associate an ID to identify jobs (if needed later)
jobs.append(job)
# cluster.wait() # waits until all jobs finish
for job in jobs:
host, n = job() # waits for job to finish and returns results
print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
# other fields of 'job' that may be useful:
# job.stdout, job.stderr, job.exception, job.ip_addr, job.end_time
cluster.print_status() # shows which nodes executed how many jobs etc.
当我运行这个(python sample.py
)时,它只是挂起。通过pdb调试,我发现它最终挂在dispy/__init__.py(117)__call__()
。该行的内容为self.finish.wait()
。完成仅仅是一个Python线程,为wait()
然后进入lib/python3.5/threading.py(531)wait()
。它一旦等待就会挂起。
我试着运行在客户机上dispynode,并得到了相同的结果。我已经尝试了很多传球节点的变种到创建集群,e.g:
cluster = dispy.JobCluster(compute, nodes=['localhost'])
cluster = dispy.JobCluster(compute, nodes=['*'])
cluster = dispy.JobCluster(compute, nodes=[<hostname of the remote node running the client>])
我试着与cluster.wait()
行注释掉运行,并得到了相同的结果。
当我将记录(cluster = dispy.JobCluster(compute, loglevel = 10)
),我得到了在客户端的输出如下:
2016-06-14 10:27:01 asyncoro - version 4.1 with epoll I/O notifier
2016-06-14 10:27:01 dispy - dispy client at :51347 2016-06-14 10:27:01 dispy - Storing fault recovery information in "_dispy_20160614102701"
2016-06-14 10:27:01 dispy - Pending jobs: 0
2016-06-14 10:27:01 dispy - Pending jobs: 1
2016-06-14 10:27:01 dispy - Pending jobs: 2
2016-06-14 10:27:01 dispy - Pending jobs: 3
2016-06-14 10:27:01 dispy - Pending jobs: 4
2016-06-14 10:27:01 dispy - Pending jobs: 5
2016-06-14 10:27:01 dispy - Pending jobs: 6
2016-06-14 10:27:01 dispy - Pending jobs: 7
2016-06-14 10:27:01 dispy - Pending jobs: 8
2016-06-14 10:27:01 dispy - Pending jobs: 9
2016-06-14 10:27:01 dispy - Pending jobs: 10
这似乎并不意外,但并不能帮助我弄清楚为什么工作不运行。
对于它的价值,这里的_dispy_20160614102701.bak:
'_cluster', (0, 207)
'compute_1465918021755', (512, 85)
同样,_dispy_20160614102701.dir:
'_cluster', (0, 207)
'compute_1465918021755', (512, 85)
我离开的猜测,除非我使用一个不稳定的版本。
我也有这种类型的问题。我想知道是否有解决这个问题的办法? – avstenit
我还没找到。事实上,我放弃了,所以我甚至都没有为此付出恩典。我也试过[scoop](https://github.com/soravux/scoop),它在表面上完全符合我的需求,但它有一个非常奇怪的[任意限制我可以有效添加的处理器的最大数量](https://groups.google.com/forum/#!topic/scoop-users/WlmqPzlsdec)。我放弃了,决定使用ssh的基本popen,并编写自己的调度程序。 –
@ThomasGuenet你提出了一个我将要拒绝的编辑。编辑是不恰当的,因为你正在改变我实际上说过的事情。我确实运行过'python dispy.py',而不是'dispy.py'。他们如何运行是有区别的,因为你的方式是作为一个模块。这种差异可能是该计划悬而未决的原因。所以你的编辑是不恰当的,但它可能是一个很好的答案。写下来作为答案,说明如何运行'dispy.py'而不是'python dispy.py'可以解决问题。如果你令人信服地展示它,你将会回答这个问题。 –