2015-06-03 39 views
0

我试图尽可能快地处理整个csv文件,所以我正在寻找并行处理每一行作为芹菜任务。清理也是一项芹菜任务,必须等到每一行都被处理完毕。看下面的例子。MySQL在芹菜任务期间不断丢失连接

问题是,我似乎无法通过一个文件,因为我一直运行到与MySQL连接错误。到目前为止,我已经看到了这两个错误:2013, 'Lost connection to MySQL server during query'2006, 'MySQL server has gone away'

from app.db.meta import Session 
from celery import chord, Celery 
from celery.signals import task_postrun 

celery = Celery() 
celery.config_from_object('config') 

@task_postrun.connect 
def close_session(*args, **kwargs): 
    Session.remove() 

def main(): 
    # process each line in parallel 
    header = [process_line.s(line) for line in csv_file] 
    # pass stats to cleanup after all lines are processed 
    callback = cleanup.s() 
    chord(header)(callback) 

@celery.task 
def process_line(line): 
    session = Session() 
    ... 
    # process line 
    ... 
    return stats 

@celery.task 
def cleanup(stats): 
    session = Session() 
    ... 
    # do cleanup and log stats 
    ... 

我用芹菜3.1.18和0.9.9 SQLAlchemy的。我也在使用连接池。

mysql> SHOW FULL PROCESSLIST;                 
+----+------+-----------+-----------------+---------+------+-------+-----------------------+ 
| Id | User | Host  | db    | Command | Time | State   | Info    | 
+----+------+-----------+-----------------+---------+------+-------+-----------------------+       
| 1 | root | localhost | ab__development | Sleep | 4987 |  | NULL     |       
| 11 | root | localhost | ab__development | Sleep | 1936 |  | NULL     |       
| 16 | root | localhost | ab__development | Sleep | 143 |  | NULL     |       
| 17 | root | localhost | ab__development | Sleep | 1045 |  | NULL     |       
| 18 | root | localhost | NULL   | Query | 0 | init | SHOW FULL PROCESSLIST |            
| 21 | root | localhost | ab__development | Sleep | 7 |  | NULL     |       
+----+------+-----------+-----------------+---------+------+-------+-----------------------+       
6 rows in set (0.01 sec)                  
+0

没有价值'max_connection'设置,所以我假设的100 – BDuelz

+0

'MAX_CONNECTIONS默认= 151' – BDuelz

+0

我不能复制'show processlist'的整个输出。但是,我看到6行 - 所有相同的用户和主机。其中5个具有相同的db(应用程序数据库),而另一个则为NULL。其中5个表示Sleep作为命令,另一个表示查询。其中5个具有大的时间值,而另一个具有0. – BDuelz

回答

0

Read the answer。总之,你必须要么禁用SQLAlchemy's Pool engine或尝试ping MySQL服务器:

from flask.ext.sqlalchemy import SQLAlchemy 
from sqlalchemy import event, exc 


def instance(app): 
    """:rtype: SQLAlchemy""" 
    db = SQLAlchemy(app) 

    if app.testing: 
     return db 

    @event.listens_for(db.engine, 'checkout') 
    def checkout(dbapi_con, con_record, con_proxy): 
     try: 
      try: 
       dbapi_con.ping(False) 
      except TypeError: 
       app.logger.debug('MySQL connection died. Restoring...') 
       dbapi_con.ping() 
     except dbapi_con.OperationalError as e: 
      app.logger.warning(e) 
      if e.args[0] in (2006, 2013, 2014, 2045, 2055): 
       raise exc.DisconnectionError() 
      else: 
       raise 

    return db