提高查询性能

我需要从PostgreSQL数据库读取并加入很多行（〜500k）并将它们写入MySQL数据库。提高查询性能

我幼稚的做法是这样的

entrys = Entry.query.yield_per(500) 

    for entry in entrys: 
     for location in entry.locations: 
      mysql_location = MySQLLocation(entry.url) 
      mysql_location.id = location.id 
      mysql_location.entry_id = entry.id 

      [...] 

      mysql_location.city = location.city.name 
      mysql_location.county = location.county.name 
      mysql_location.state = location.state.name 
      mysql_location.country = location.country.name 

      db.session.add(mysql_location) 

    db.session.commit()

每个Entry具有约1〜100 Locations。

这个脚本现在运行了大约20个小时，并且已经消耗大于4GB的内存，因为所有内容都保存在内存中，直到会话被提交。

随着我早前提交的尝试，我遇到了像this这样的问题。

如何提高查询性能？它需要快得多，因为在接下来的几个月里，行数将增长到2500k左右。

来源

2013-08-02 dbanck

为什么不能使用[Extract，Transform，Load]（http://en.wikipedia.org/wiki/Extract,_transform,_load）方法？ – AndrewS

基本上'pg_dump dbname | mysql dbname' –

@JochenRitzel，我将多个表中的多行连接成一行。我没有看到'pg_dump'如何提供帮助。 – dbanck

你的天真方法存在缺陷，原因是你已经知道 - 吃你的记忆的东西是模型对象在等待被刷新到mysql的内存中晃来晃去。

最简单的方法是根本不使用ORM进行转换操作。直接使用SQLAlchemy表对象，因为它们也更快。

此外，您可以做的是创建2个会话，并将2个引擎绑定到单独的会话中！然后你可以提交每个批次的mysql会话。

来源

2013-08-02 10:49:27

我支持2个单独的会话，其中每个会使用[expunge_all（）]（http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#sqlalchemy.orm）清理它们。 session.Session.expunge_all）。另外，您（@dbanck）运行的问题也使用范围查询而不是yield_per来解决。 – van

提高查询性能

回答

相关问题