2017-10-10 69 views
0

当我运行luigi任务时,有时会遇到框架崩溃,导致以下任务全部失败。在这里,错误日志信息:Luigi框架崩溃

2017-10-05 22:02:02,564 luigi-interface WARNING Failed pinging scheduler 
2017-10-05 22:02:03,129 requests.packages.urllib3.connectionpool INFO  Starting new HTTP connection (126): localhost 
2017-10-05 22:02:03,130 luigi-interface ERROR Failed connecting to remote scheduler 'http://localhost:8082' 
Traceback (most recent call last): 
    ... 
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 585, in send 
    r = adapter.send(request, **kwargs) 
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/requests/adapters.py", line 467, in send 
    raise ConnectionError(e, request=request) 
    ConnectionError: HTTPConnectionPool(host='localhost', port=8082): Max retries exceeded with url: /api/add_worker (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f15128cb3d0>: Failed to establish a new connection: [Errno 111] Connection refused',)) 
2017-10-05 22:02:03,180 luigi-interface INFO  Worker Worker(salt=150908931, workers=3, host=etl2, username=develop, pid=18019) was stopped. Shutting down Keep-Alive thread 
Traceback (most recent call last): 
    File "app_metadata.py", line 1567, in <module> 
    luigi.run() 
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/interface.py", line 210, in run 
    return _run(*args, **kwargs)['success'] 
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/interface.py", line 238, in _run 
    return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory) 
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/interface.py", line 197, in _schedule_and_run 
    success &= worker.run() 
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/worker.py", line 867, in run 
    self._add_worker() 
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/worker.py", line 652, in _add_worker 
    self._scheduler.add_worker(self._id, self._worker_info) 
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/rpc.py", line 219, in add_worker 
    return self._request('/api/add_worker', {'worker': worker, 'info': info}) 
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/rpc.py", line 146, in _request 
    page = self._fetch(url, body, log_exceptions, attempts) 
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/rpc.py", line 138, in _fetch 
    last_exception 
    luigi.rpc.RPCError: Errors (3 attempts) when connecting to remote scheduler 'http://localhost:8082' 

听起来像是尝试ping中央的时间表,但可以失败,然后坠毁,以后的任务都被阻塞,无法成功运行。

和其他人也遇到类似的错误,但他的决议不起作用。 Github - Failed connecting to remote scheduler #1894

回答

1

如果您的中央调度程序越来越重,我会试着让超时时间稍长。您也可以增加重试次数并重试等待时间。

在luigi.cfg

[core] 
rpc-connect-timeout=60.0 #default is 10.0 
rpc-retry-attempts=10 #default is 3 
rpc-retry-wait=60  #default is 30 

您可能还需要添加一个手表有调度进程崩溃自动重启。

+0

如何自动检查调度程序进程的状态,读取和解析日志信息? –