我会建议使用HDF5。它们对IO来说非常快。 这里是你如何写你的数据:
import numpy as np
import tables
fname = 'myOutput.h5'
length = 100 # your data length
my_data_generator = xrange(length) # Your data comes here instead of the xrange
filters = tables.Filters(complib='blosc', complevel=5) # you could change these
h5file = tables.open_file(fname, mode='w', title='yourTitle', filters=filters)
group = h5file.create_group(h5file.root, 'MyData', 'MyData')
x_atom = tables.Float32Atom()
x = h5file.create_carray(group, 'X', atom=x_atom, title='myTitle',
shape=(length,), filters=filters)
# this is a basic example. It will be faster if you write it in larger chunks in your real code
# like x[start1:end1] = elements[start2:end2]
for element_i, element in enumerate(my_data_generator):
x[element_i] = element
h5file.flush()
h5file.close()
对于阅读它使用:
h5file = tables.open_file(fname, mode='r')
x = h5file.get_node('/MyData/X')
print x[:10]
结果:
marray([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float32)
Python内置的文件读/写不会完成这个吗? – lucasnadalutti
是的,当然,我想知道是否有更高效的numpy方式。现在我正在编辑我的文章。 –
@ P-M为什么将它们保存在同一个文件中非常重要? – MZHm