2014-03-14 30 views
1

我想通过一个类到iPython并行执行。实际上,这段代码会运行,但每次都会加载“时区”。这个类每个负载需要大约10s,所以这个开销是不可接受的,除非它只发生一次,或者每个核心发生一次。 我对并行化非常陌生,现在我想知道将进口移出函数。至少我认为这是正确的做法。传递类到iPython并行

from IPython import parallel 
clients = parallel.Client() 
lview = clients.load_balanced_view() 

lview.block = True 

lats = [32.21, 34.98] 
lons = [109.45, -102.4] 
times = ['2014-03-12T16:20:44.000000000Z', '2014-03-12T15:48:52.000000000Z'] 

@lview.parallel() 
def f(lats, lons, times): 
    import sys,os 
    sys.path.append("../utils/") # For grabbing 'Timezone' 

    import Timezone as Timezone 
    tz = Timezone.Timezone() 

    # Use tz to compute local time 
    a = tz.compute_local_time(lats, lons, times) 

    return a 

%time f.map(lats, lons, times) 

结果:在时间(约22秒),

in sync results <function __call__ at 0x105d2db18> 
CPU times: user 700 ms, sys: 232 ms, total: 932 ms 
Wall time: 11.6 s 
Out[15]: 
[('Asia/Chongqing', '2014-03-13 00:20:44'), 
('America/Chicago', '2014-03-12 10:48:52')] 

结果双如果I双输入数据的长度。 我怎样才能通过tz并让每个核心都调用Timezone方法。

回答

1

我想通了。这是我做到的。
首先,我使用直接视图并将模块加载到每个内核上,然后使用scattergather分解输入,最后使用map访问数组/列表输入。

from IPython import parallel 
from IPython import parallel as p 

rc = p.Client() 
rc[:].execute('import sys,os') 
rc[:].execute('sys.path.append("../utils/")') 
rc[:].execute('import Timezone as Timezone; tz = Timezone.Timezone()') 

dview = rc[:] # A DirectView of all engines 
dview.block = True 

在下一单元格:

def f(v, lats, lons, times): 
    v.scatter('lat', lats) 
    v.scatter('lon', lons) 
    v.scatter('time', times) 
    v.execute("D=map(tz.compute_local_time, lat, lon, time)") 
    return v.gather('D', block=True) 

lats = [32.21] 
lons = [109.45] 
times = ['2014-03-12T16:20:44.000000000Z'] 

%time r = f(dview, lats, lons, times) 

这给了我想要的输出,正要快两倍比只使用:

map(tz.compute_local_time, lat, lon, time)