稀疏矩阵切片内存错误

我有一个稀疏矩阵csr：稀疏矩阵切片内存错误

<681881x58216 sparse matrix of type '<class 'numpy.int64'>' 
    with 2867209 stored elements in Compressed Sparse Row format>

，我想创建一个新的sparce矩阵的csr片： csr_2 = csr[1::2,:]。

问题：时，我有csr矩阵而已，我的服务器的内存忙于40 GB。当我运行csr_2 = csr[1::2,:]时，我的服务器RAM正在完全转储128GB，并且随着“内存错误”而下降。

来源

2017-09-04 Ladenkov Vladislav

你基质本身在你的例子仅仅是22MB（值）+一些AUX-东西，大概<内存80MB。那么你确定，这是你问题的根源（服务器上的其他内容可能使用了39GB的内存）？（并且稀疏矩阵切片会顺便产生一个副本） – sascha

（1）这个切片将每个元素放在另一个元素之后，从第二个元素（奇数元素）开始。（2）服务器有很多docker fcontainer和其他维护进程一起运行，总共需要40GB –

sparse使用矩阵乘法来选择这样的行。我在另一个SO问题中计算了extractor矩阵的细节，但大致上要从（m，n）中得到一个（p，n）矩阵，它需要使用一个（p，m）矩阵（用非零值）。

矩阵乘法本身是一个2遍过程。第一遍决定了结果矩阵的大小。

与密集的numpy数组相比，稀疏矩阵切片永远不会返回视图。

Sparse matrix slicing using list of int

对提取矩阵的细节。我也建议测试csr.sum(axis=1)，因为它也使用矩阵乘法。

def extractor(indices, N): 
    indptr=np.arange(len(indices)+1) 
    data=np.ones(len(indices)) 
    shape=(len(indices),N) 
    return sparse.csr_matrix((data,indices,indptr), shape=shape)

所以索引每隔一行要求：

In [99]: M = sparse.random(100,80,.1, 'csr') 
In [100]: M 
Out[100]: 
<100x80 sparse matrix of type '<class 'numpy.float64'>' 
    with 800 stored elements in Compressed Sparse Row format> 
In [101]: E = extractor(np.r_[1:100:2],100) 
In [102]: E 
Out[102]: 
<50x100 sparse matrix of type '<class 'numpy.float64'>' 
    with 50 stored elements in Compressed Sparse Row format> 
In [103]: M1 = E*M 
In [104]: M1 
Out[104]: 
<50x80 sparse matrix of type '<class 'numpy.float64'>' 
    with 407 stored elements in Compressed Sparse Row format>

来源

2017-09-04 16:10:14 hpaulj

谢谢，我会稍后研究你的答案！ –

那么，你提出的解决方案是使用提取函数？ –

不，我只是提出一个原因，你可能会得到内存错误。但没有你的数据，内存等我无法证明它。 – hpaulj

稀疏矩阵切片内存错误

回答

相关问题