2013-10-11 73 views
1

追加例如,我们有基质(例如我们想要存储numpy的阵列)和我们将它存储在HDF5文件,但随后我们希望通过附加一些行到原始矩阵的端部延伸矩阵(坐考虑到原始矩阵可能非常大〜几十Gb,并且不能加载到RAM)HDF5矩阵在python

此外,我们希望能够从任意点读取矩阵中的少数几行(也许称为slice(?) )而无需在RAM中加载整个矩阵。

任何人都可以提供一个例子如何能够在Python做呢?

UPDATE:

我认为另一个选择是numpy.memmap,但似乎没有追加。

This似乎也是一种选择,但它使用原始二进制数据,但我想访问矩阵。此外,我不知道如何做append。

回答

0

如果你要与HDF5文件中工作,那么我可以建议你使用现有的库,例如Pytables之一。我在这里发布并简化了他们的教程:http://pytables.github.io/usersguide/tutorials.html

from tables import * 

# Define a user record to characterize some kind of particles 
class Particle(IsDescription): 
    name  = StringCol(16) # 16-character String 
    idnumber = Int64Col()  # Signed 64-bit integer 
    ADCcount = UInt16Col()  # Unsigned short integer 
    TDCcount = UInt8Col()  # unsigned byte 
    grid_i = Int32Col()  # integer 
    grid_j = Int32Col()  # integer 
    pressure = Float32Col() # float (single-precision) 
    energy = FloatCol()  # double (double-precision) 

filename = "test.h5" 
# Open a file in "w"rite mode 
h5file = openFile(filename, mode = "w", title = "Test file") 
# Create a new group under "/" (root) 
group = h5file.createGroup("/", 'detector', 'Detector information') 
# Create one table on it 
table = h5file.createTable(group, 'readout', Particle, "Readout example") 
# Fill the table with 10 particles 
particle = table.row 
for i in xrange(10): 
    particle['name'] = 'Particle: %6d' % (i) 
    particle['TDCcount'] = i % 256 
    particle['ADCcount'] = (i * 256) % (1 << 16) 
    particle['grid_i'] = i 
    particle['grid_j'] = 10 - i 
    particle['pressure'] = float(i*i) 
    particle['energy'] = float(particle['pressure'] ** 4) 
    particle['idnumber'] = i * (2 ** 34) 
    # Insert a new particle record 
    particle.append() 
# Close (and flush) the file 
h5file.close() 

#now we will append some data to the table, after taking some slices 
f=tables.openFile(filename, mode="a") 
f.root.detector 
f.root.detector.readout 
f.root.detector.readout[1::3] 
f.root.detector.readout.attrs.TITLE 
ro = f.root.detector.readout 

#generators work 
[row['energy'] for row in ro.where('pressure > 10')] 


#append some data 
table = f.root.detector.readout 
particle = table.row 
for i in xrange(10, 15): 
    particle['name'] = 'Particle: %6d' % (i) 
    particle['TDCcount'] = i % 256 
    particle['ADCcount'] = (i * 256) % (1 << 16) 
    particle['grid_i'] = i 
    particle['grid_j'] = 10 - i 
    particle['pressure'] = float(i*i) 
    particle['energy'] = float(particle['pressure'] ** 4) 
    particle['idnumber'] = i * (2 ** 34) 
    particle.append() 
table.flush() 
f.close()