我有一个脚本,用于从csv文件中分析信息并执行SQL语句来创建表并插入数据。我必须解析一个〜25 GB的csv文件,但用我目前的脚本,我估计可能需要长达20天的时间,根据我已解析的以前大小的文件来判断。有关如何优化我的脚本以便运行速度更快的建议?由于它只被调用一次,所以我省略了createtable函数。 InsertRow()是我认为我真的需要做得更快的功能。提前致谢。优化执行SQL的Python脚本
#Builds sql insert statements and executes sqlite3 calls to insert the rows
def insertRow(cols):
first = True; #First value for INSERT arguments doesn't need comma front of it.
conn = sqlite3.connect('parsed_csv.sqlite')
c = conn.cursor()
print cols
insert = "INSERT INTO test9 VALUES("
for col in cols:
col = col.replace("'", "")
if(first):
insert += "'" + col + "'"
first = False;
else:
insert += "," + "'" + col+ "'" + " "
insert += ")"
print (insert)
c.execute(insert)
conn.commit()
def main():
#Get rid of first argument (filename)
cmdargs = sys.argv[1:]
#Convert values to integers
cmdargs = list(map(int, cmdargs))
#Get headers
with open(r'requests_fields.csv','rb') as source:
rdr = csv.reader(source)
for row in rdr:
createTable(row[:], cmdargs[:])
with open(r'test.csv','rb') as source:
rdr= csv.reader(source)
for row in rdr:
#Clear contents of list
outlist =[]
#Append all rows onto list and then write to row in output csv file
for index in cmdargs:
outlist.append(row[index])
insertRow(outlist[:])
难道我遇到的慢速可能是因为每次在insertRow()中创建一个连接到数据库?
除此之外:缩进代码正确 –
我想你可以导入CSV到sqlite的。直接不需要像这样的脚本:http://www.sqlite.org/cvstrac/wiki?p=ImportingFiles – YXD
也许你想使用数据库而不是SQLite的25GB的数据? –