方法#1:矩阵乘法与6D
置换
% Get sizes
[m1,m2,~] = size(X);
[n1,n2,N,n4,n5] = size(Y);
% Lose the third dim from X and Y with matrix-multiplication
parte1 = reshape(permute(Y,[1,2,4,5,3]),[],N)*reshape(X,[],N).';
% Rearrange the leftover dims to bring kron format
parte2 = reshape(parte1,[n1,n2,I,J,m1,m2]);
% Lose dims correspinding to last two dims coming in from Y corresponding
% to the iterative summation as suggested in the question
out = reshape(permute(sum(sum(parte2,3),4),[1,6,2,5,3,4]),m1*n1,m2*n2)
方法2:简单7D
置换
% Get sizes
[m1,m2,~] = size(X);
[n1,n2,N,n4,n5] = size(Y);
% Perform kron format elementwise multiplication betwen the first two dims
% of X and Y, keeping the third dim aligned and "pushing out" leftover dims
% from Y to the back
mults = bsxfun(@times,permute(X,[4,2,5,1,3]),permute(Y,[1,6,2,7,3,4,5]));
% Lose the two dims with summation reduction for final output
out = sum(reshape(mults,m1*n1,m2*n2,[]),3);
验证
下面是运行原始和提出的方法的设置 -
% Setup inputs
X = rand(10,10,10);
Y = rand(10,10,10,10,10);
% Original approach
[n1,n2,N,I,J] = size(Y);
K = zeros(100);
for a=1:N
for i=1:I
for j=1:J
M = kron(X(:,:,a).',Y(:,:,a,i,j));
K = K + M;
end
end
end
% Approach #1
[m1,m2,~] = size(X);
[n1,n2,N,n4,n5] = size(Y);
mults = bsxfun(@times,permute(X,[4,2,5,1,3]),permute(Y,[1,6,2,7,3,4,5]));
out1 = sum(reshape(mults,m1*n1,m2*n2,[]),3);
% Approach #2
[m1,m2,~] = size(X);
[n1,n2,N,n4,n5] = size(Y);
parte1 = reshape(permute(Y,[1,2,4,5,3]),[],N)*reshape(X,[],N).';
parte2 = reshape(parte1,[n1,n2,I,J,m1,m2]);
out2 = reshape(permute(sum(sum(parte2,3),4),[1,6,2,5,3,4]),m1*n1,m2*n2);
运行后,我们看到了最大。与提出的方法相对于原始方法的绝对偏差 -
>> error_app1 = max(abs(K(:)-out1(:)))
error_app1 =
1.1369e-12
>> error_app2 = max(abs(K(:)-out2(:)))
error_app2 =
1.1937e-12
值看起来不错!
标杆
使用相同的大数据集的时序这三种方法作为用于验证,我们得到这样的事情 -
----------------------------- With Loop
Elapsed time is 1.541443 seconds.
----------------------------- With BSXFUN
Elapsed time is 1.283935 seconds.
----------------------------- With MATRIX-MULTIPLICATION
Elapsed time is 0.164312 seconds.
好像矩阵乘法做得相当不错为这些尺寸的数据集!
要明确一点,在这两种新方法中,out1和out2都取代K? 如果是这样,在K = K + M更复杂的情况下,是否有可能反映出这一点? 例如在最初的例子中,如果每个M都被一个常量改变了,那么每次循环的迭代都会发生什么变化呢?第二个例子是如果M在将它添加到K之前应用了布尔掩码。这些额外的步骤是否会打破输出? –
@MikeVandenberg对于ex1,作为预处理步骤使用:'Y = bsxfun(@times,Y,permute(scale,[4,5,1,2,3]));'然后使用所提出的代码,其中scale是尺寸为'(N,I,J)'的缩放矩阵。对于ex2,你会为每次迭代或同一个迭代使用不同的掩码吗? – Divakar
我想我不应该过于简单化这些步骤,因为它们不是微不足道的。这里是构造K的具体函数: 'pr = real(trace(E * M)); K = K + H(i,j,a)* M/pr;' 其中H是事先已知的矩阵。所以是的,在每次迭代中,我们拉出不同的H片,并用一个不同的常数M来标准化M,这个常数就是乘积E * M的轨迹,其中E是一个布尔掩模。 –