为什么X.dot（X.T）在numpy中需要这么多内存？

X是一个n×p矩阵，其中p比n大得多。比方说，N = 1000，P = 500000当我运行：为什么X.dot（X.T）在numpy中需要这么多内存？

X = np.random.randn(1000,500000) 
S = X.dot(X.T)

执行此操作结束了拍摄的内存很大，尽管尺寸的结果是1000×1000的内存使用落回原位，一旦操作完成。有没有办法解决？

来源

2014-01-18 bluecat

有趣。除了'X'和'S'的空间外，这应该在很小的恒定空间内执行：以'X'换位视图，计算并为'S'分配空间，并计算'S'的每个元素直。从我对NumPy的了解来看，类似的东西会自动从这个操作组合中脱落出来...... – delnan

我不确定细节，但是只要有可能，numpy会尝试使用BLAS例程，我认为它会['sgemm用于矩阵乘法的']（http://www.math.utah.edu/software/lapack/lapack-blas/sgemm.html）。我敢打赌，这些高度优化的例程需要连续的数据（可能采用Fortran命令），因此必须对您的情况进行复制。 – Jaime

使用numpy> = 1.8，这将有助于许多此类情况。 – seberg

的问题不在于X和X.T是示相同的内存空间的本身，而是X.T是F-连续而不是C连续的。当然，在的情况下，您必须对至少一个输入数组使用，在这种情况下，您将数组与其转置的视图相乘。

在numpy的< 1.8，np.dot将创建任何 F-有序输入阵列，而不只是那些碰巧意见到的相同内存块的C-有序的副本。

例如：

X = np.random.randn(1000,50000) 
Y = np.random.randn(50000, 100) 

# X and Y are both C-order, no copy 
%memit np.dot(X, Y) 
# maximum of 1: 485.554688 MB per loop 

# make X Fortran order and Y C-order, now the larger array (X) gets 
# copied 
X = np.asfortranarray(X) 
%memit np.dot(X, Y) 
# maximum of 1: 867.070312 MB per loop 

# make X C-order and Y Fortran order, now the smaller array (Y) gets 
# copied 
X = np.ascontiguousarray(X) 
Y = np.asfortranarray(Y) 
%memit np.dot(X, Y) 
# maximum of 1: 523.792969 MB per loop 

# make both of them F-ordered, both get copied! 
X = np.asfortranarray(X) 
%memit np.dot(X, Y) 
# maximum of 1: 905.093750 MB per loop

如果复制是一个问题（例如，当X是非常大的），你能做些什么呢？

最好的选择可能是升级到更新版本的numpy--就像@perimosocordiae指出的那样，这个性能问题在this pull request中解决。

如果因任何原因，你无法升级numpy的，还有一招，可以让你直接通过scipy.linalg.blas调用相关功能BLAS（从this answer无耻被盗后不强制副本进行快速，基于BLAS点产品）：

from scipy.linalg import blas 
X = np.random.randn(1000,50000) 

%memit res1 = np.dot(X, X.T) 
# maximum of 1: 845.367188 MB per loop 

%memit res2 = blas.dgemm(alpha=1., a=X.T, b=X.T, trans_a=True) 
# maximum of 1: 471.656250 MB per loop 

print np.all(res1 == res2) 
# True

来源

2014-01-19 03:01:35

这是上面提到的，但numpy> = 1.8为你解决了这个问题：https://github.com/numpy/numpy/pull/2730 – perimosocordiae

@perimosocordiae是的，但有些人可能会觉得无论什么原因升级尴尬（破依赖关系等），所以我想我会使用1.7.1给出解决方案。我会编辑我的答案，使其更清晰。 –

为什么X.dot（X.T）在numpy中需要这么多内存？

回答

相关问题