0
我试图尽可能快地处理整个csv文件,所以我正在寻找并行处理每一行作为芹菜任务。清理也是一项芹菜任务,必须等到每一行都被处理完毕。看下面的例子。MySQL在芹菜任务期间不断丢失连接
问题是,我似乎无法通过一个文件,因为我一直运行到与MySQL连接错误。到目前为止,我已经看到了这两个错误:2013, 'Lost connection to MySQL server during query'
和2006, 'MySQL server has gone away'
from app.db.meta import Session
from celery import chord, Celery
from celery.signals import task_postrun
celery = Celery()
celery.config_from_object('config')
@task_postrun.connect
def close_session(*args, **kwargs):
Session.remove()
def main():
# process each line in parallel
header = [process_line.s(line) for line in csv_file]
# pass stats to cleanup after all lines are processed
callback = cleanup.s()
chord(header)(callback)
@celery.task
def process_line(line):
session = Session()
...
# process line
...
return stats
@celery.task
def cleanup(stats):
session = Session()
...
# do cleanup and log stats
...
我用芹菜3.1.18和0.9.9 SQLAlchemy的。我也在使用连接池。
mysql> SHOW FULL PROCESSLIST;
+----+------+-----------+-----------------+---------+------+-------+-----------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+-----------------+---------+------+-------+-----------------------+
| 1 | root | localhost | ab__development | Sleep | 4987 | | NULL |
| 11 | root | localhost | ab__development | Sleep | 1936 | | NULL |
| 16 | root | localhost | ab__development | Sleep | 143 | | NULL |
| 17 | root | localhost | ab__development | Sleep | 1045 | | NULL |
| 18 | root | localhost | NULL | Query | 0 | init | SHOW FULL PROCESSLIST |
| 21 | root | localhost | ab__development | Sleep | 7 | | NULL |
+----+------+-----------+-----------------+---------+------+-------+-----------------------+
6 rows in set (0.01 sec)
没有价值'max_connection'设置,所以我假设的100 – BDuelz
'MAX_CONNECTIONS默认= 151' – BDuelz
我不能复制'show processlist'的整个输出。但是,我看到6行 - 所有相同的用户和主机。其中5个具有相同的db(应用程序数据库),而另一个则为NULL。其中5个表示Sleep作为命令,另一个表示查询。其中5个具有大的时间值,而另一个具有0. – BDuelz