如何用numpy读取二进制文件的一部分？

我正在将matlab脚本转换为numpy，但在从二进制文件读取数据时遇到了一些问题。当使用fromfile跳过文件的开头时，是否等同于fseek？这是我需要做的提取类型：如何用numpy读取二进制文件的一部分？

fid = fopen(fname); 
fseek(fid, 8, 'bof'); 
second = fread(fid, 1, 'schar'); 
fseek(fid, 100, 'bof'); 
total_cycles = fread(fid, 1, 'uint32', 0, 'l'); 
start_cycle = fread(fid, 1, 'uint32', 0, 'l');

谢谢！

来源

2013-01-09 brorfred

您可以按正常方式使用文件对象查找，然后在fromfile中使用此文件对象。这里有一个完整的例子：

import numpy as np 
import os 

data = np.arange(100, dtype=np.int) 
data.tofile("temp") # save the data 

f = open("temp", "rb") # reopen the file 
f.seek(256, os.SEEK_SET) # seek 

x = np.fromfile(f, dtype=np.int) # read the data into numpy 
print x 
# [64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 
# 89 90 91 92 93 94 95 96 97 98 99]

来源

2013-01-09 20:09:26 tom10

Dang，文件对象而不是文件名！这正是我正在寻找的，这应该被添加到从文件的文件...谢谢！ – brorfred

有可能是一个更好的答案......但是当我面对这个问题时，我有一个文件，我已经想要分别访问不同的部分，这给我一个简单的解决这个问题。

例如，说chunkyfoo.bin是由6字节的标题，1024字节的numpy数组和另一个1024字节的numpy数组组成的文件。你不能只打开文件并寻找6个字节（因为numpy.fromfile所做的第一件事是lseek回到0）。但是你可以mmap的文件，并使用fromstring代替：

with open('chunkyfoo.bin', 'rb') as f: 
    with closing(mmap.mmap(f.fileno(), length=0, access=mmap.ACCESS_READ)) as m: 
     a1 = np.fromstring(m[6:1030]) 
     a2 = np.fromstring(m[1030:])

这听起来像你想要做什么。当然，除了在现实生活中，到a1和a2的偏移和长度可能取决于头部，而不是固定的评论。

标题仅为m[:6]，您可以通过明确地将其分开，使用struct模块或其他数据解析该数据。但是，如果您愿意的话，您可以在构建m之前或之后明确seek和read，从f或之后，甚至在m上进行相同的呼叫，并且它将起作用，而不会影响a1和a2。

的替代，我已经对不同的非numpy - 相关项目的完成，是创建一个封装文件的对象，像这样：

class SeekedFileWrapper(object): 
    def __init__(self, fileobj): 
     self.fileobj = fileobj 
     self.offset = fileobj.tell() 
    def seek(self, offset, whence=0): 
     if whence == 0: 
      offset += self.offset 
     return self.fileobj.seek(offset, whence) 
    # ... delegate everything else unchanged

我做的是“委托其他一切不变”在施工时产生一个list属性，并在__getattr__中使用这个属性，但是你可能想要的东西不那么黑。 numpy只依赖类似文件对象的一些方法，并且我认为它们被正确记录，所以只需明确地委托它们。但我认为mmap解决方案在这里更有意义，除非您试图通过机械方式将一系列明确的基于seek的代码移植到此处。（你会认为mmap也会让你选择将它作为numpy.memmap而不是numpy.array，这让numpy有更多的控制/来自分页等的反馈。但是实际上得到一个numpy.memmap和一个mmap一起工作。）

来源

2013-01-09 19:56:13 abarnert

谁低估了这一点，谨慎解释为什么？ – abarnert

旧线程，但是从文件读取文件中的文件位置。许多这种机器是不需要的，并且会比numpy.from文件慢。 – noobermin

这就是我在读取任意异构二进制文件时所要做的。
Numpy允许通过更改数组的dtype以任意方式解释位模式。问题中的Matlab代码读取一个char和两个uint。

请阅读paper（简单阅读用户级别，而不是科学家）关于通过更改数组的dtype，stride和维度可以实现的功能。

import numpy as np 

data = np.arange(10, dtype=np.int) 
data.tofile('f') 

x = np.fromfile('f', dtype='u1') 
print x.size 
# 40 

second = x[8] 
print 'second', second 
# second 2 

total_cycles = x[8:12] 
print 'total_cycles', total_cycles 
total_cycles.dtype = np.dtype('u4') 
print 'total_cycles', total_cycles 
# total_cycles [2 0 0 0]  !endianness 
# total_cycles [2] 

start_cycle = x[12:16] 
start_cycle.dtype = np.dtype('u4') 
print 'start_cycle', start_cycle 
# start_cycle [3] 

x.dtype = np.dtype('u4') 
print 'x', x 
# x [0 1 2 3 4 5 6 7 8 9] 

x[3] = 423 
print 'start_cycle', start_cycle 
# start_cycle [423]

来源

2013-01-09 21:20:14

如何用numpy读取二进制文件的一部分？

回答

相关问题