读取数据文件不应该是瓶颈。下面的代码在大约0.2秒读36 MB,697997行文本文件我的机器上:
import time
start = time.clock()
with open('procmail.log', 'r') as f:
lines = f.readlines()
end = time.clock()
print 'Readlines time:', end-start
因为它产生以下结果:
Readlines time: 0.1953125
注意,此代码生成一个列表线条一气呵成。
要知道你去过的地方,只需将你处理的行数写入文件。然后如果您想再试一次,请阅读所有行并跳过您已完成的行:
import os
# Raad the data file
with open('list.txt', 'r') as f:
lines = f.readlines()
skip = 0
try:
# Did we try earlier? if so, skip what has already been processed
with open('lineno.txt', 'r') as lf:
skip = int(lf.read()) # this should only be one number.
del lines[:skip] # Remove already processed lines from the list.
except:
pass
with open('lineno.txt', 'w+') as lf:
for n, line in enumerate(lines):
# Do your processing here.
lf.seek(0) # go to beginning of lf
lf.write(str(n+skip)+'\n') # write the line number
lf.flush()
os.fsync() # flush and fsync make sure the lf file is written.
数字是否是唯一的? – kalgasnik
您是否正在尝试将每个数字写入单独的文件?如果是这样,为什 – root
你可以尝试使用Postgres和pl/pgsql来执行数据库本身的任何计算...... – moooeeeep