SQLAlchemy IntegrityError和批量数据导入

我使用REF完整性规则将几个10k记录插入到数据库中。有些数据行不幸是重复的（因为它们已经存在于数据库中）。在插入数据库之前检查数据库中每一行的存在是非常昂贵的，所以我打算继续处理由SQLAlchemy引发的IntegrityError异常，记录错误并继续。SQLAlchemy IntegrityError和批量数据导入

我的代码会是这个样子：

# establish connection to db etc. 

tbl = obtain_binding_to_sqlalchemy_orm() 
datarows = load_rows_to_import() 

try: 
    conn.execute(tbl.insert(), datarows) 
except IntegrityError as ie: 
    # eat error and keep going 
except Exception as e: 
    # do something else

（隐含的）假设我在上面做是SQLAlchemy的不滚动多个插入到一个事务。如果我的假设是错误的，那么这意味着如果发生IntegrityError，插入的其余部分将中止。任何人都可以确认上面的伪代码“模式”是否会按预期工作 - 或者由于引发IntegrityError异常，最终会丢失数据吗？

此外，如果有人有这样做的更好的想法，我会有兴趣听到它。

来源

2012-05-14 Homunculus Reticulli

它可以像这样工作，如果你之前没有开始任何交易，在这种情况下，sqlalchemy的autocommit feature将会启动，但你应该明确设置链接中描述的。

来源

2012-05-14 15:46:38 mata

当我解析ASCII数据文件以将数据导入到表中时，我也遇到了这个问题。问题是，我本能地直观地希望SQLAlchemy在允许唯一数据的同时跳过重复的行。或者由于当前的SQL引擎（如unicode字符串不被允许），随机错误可能会与一行一起抛出。

但是，这种行为超出了SQL接口定义的范围。 SQL API，因此SQLAlchemy只能理解事务和提交，并且不会考虑这种选择性行为。而且，依赖于自动提交功能听起来很危险，因为插入在异常之后停止，而留下其余的数据。

我的解决方案（我不确定它是否最优雅）是处理循环中的每一行，捕获并记录异常，并在最后提交更改。

假设您以某种方式获取列表列表中的数据，即列值列表的行列表。然后你阅读循环中的每一行：

# Python 3.5 
from sqlalchemy import Table, create_engine 
import logging 

# Create the engine 
# Create the table 
# Parse the data file and save data in `rows` 

conn = engine.connect() 
trans = conn.begin() # Disables autocommit 

exceptions = {} 
totalRows = 0 
importedRows = 0 

ins = table.insert() 

for currentRowIdx, cols in enumerate(rows): 
    try: 
     conn.execute(ins.values(cols)) # try to insert the column values 
     importedRows += 1 

    except Exception as e: 
     exc_name = type(e).__name__ # save the exception name 
     if not exc_name in exceptions: 
      exceptions[exc_name] = [] 
     exceptions[exc_name].append(currentRowIdx) 

    totalRows += 1 

for key, val in exceptions.items(): 
    logging.warning("%d out of %d lines were not imported due to %s."%(len(val), totalRows, key)) 

logging.info("%d rows were imported."%(importedRows)) 

trans.commit() # Commit at the very end 
conn.close()

为了最大限度地提高此操作的速度，应该禁用自动提交。我在SQLite中使用这些代码，并且它仍然比仅使用sqlite3的旧版本慢3-5倍，即使禁用了自动提交功能也是如此。（我移植到SQLAlchemy的原因是为了能够在MySQL中使用它。）

这不是最优雅的解决方案，它不像SQLite的直接接口那么快。如果我简要介绍代码并在不久的将来找到瓶颈，我会用解决方案更新这个答案。

来源

2016-01-21 14:44:35 hosolmaz

SQLAlchemy IntegrityError和批量数据导入

回答

相关问题