numpy的TOFILE（）具有非常大的数组保存所有零

当我试图挽救一个非常大（20000 X 20000元）阵列，我得到的所有零回：numpy的TOFILE（）具有非常大的数组保存所有零

In [2]: shape = (2e4,)*2 

In [3]: r = np.random.randint(0, 10, shape) 

In [4]: r.tofile('r.data') 

In [5]: ls -lh r.data 
-rw-r--r-- 1 whg staff 3.0G 23 Jul 16:18 r.data 

In [6]: r[:6,:6] 
Out[6]: 
array([[6, 9, 8, 7, 4, 4], 
     [5, 9, 5, 0, 9, 4], 
     [6, 0, 9, 5, 7, 6], 
     [4, 0, 8, 8, 4, 7], 
     [8, 3, 3, 8, 7, 9], 
     [5, 6, 1, 3, 1, 4]]) 

In [7]: r = np.fromfile('r.data', dtype=np.int64) 

In [8]: r = r.reshape(shape) 

In [9]: r[:6,:6] 
Out[9]: 
array([[0, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 0, 0]])

np.save（）做类似的奇怪的东西。

搜索网后，我发现有在OSX一个已知的bug：

https://github.com/numpy/numpy/issues/2806

当我尝试使用Python的读取来读取一个文件中的ToString（）数据（）我收到一个内存错误。

有没有更好的方法来做到这一点？任何人都可以推荐一个务实的解决方法来解决这个问题吗？

来源

2013-07-23 whg

使用mmap来存储映射文件，并使用np.frombuffer来创建指向缓冲区的数组。经测试在x86_64的Linux：

# `r.data` created as in the question 
>>> import mmap 
>>> with open('r.data') as f: 
... m = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) 
... 
>>> r = np.frombuffer(m, dtype='int64') 
>>> r = r.reshape(shape) 
>>> r[:6, :6] 
array([[7, 5, 9, 5, 3, 5], 
     [2, 7, 2, 6, 7, 0], 
     [9, 4, 8, 2, 5, 0], 
     [7, 2, 4, 6, 6, 7], 
     [2, 9, 2, 2, 2, 6], 
     [5, 2, 2, 6, 1, 5]])

注意的是，这里r是内存映射数据的视图，这使得更多的内存效率，但自带的自动拾取更改文件内容的副作用。如果您希望它指向数据的私有副本，如np.fromfile所返回的数组所做的那样，请添加r = np.copy(r)。（另外，正如所写的，这不会在Windows下运行，这需要稍微不同的mmap标志。）

来源

2013-07-23 16:28:13 user4815162342

numpy的TOFILE（）具有非常大的数组保存所有零

回答

相关问题