2017-05-29 512 views
2

我对线性模型y = X * beta + eps进行了大小(X)= [n d]的模拟研究。 我基于两种方法考虑维度d的影响。 我运行10个模拟数据并获得相应的beta估计,然后我想计算10个模拟数据中beta的平均值。matlab中的单元阵列均值

我的玩具MATLAB代码如下:

 nsim=10; %iteration number 
     dd=[4 6]; %two dimension cases,\beta=(\beta_1,\cdots,\beta_d)^T 
     ddlen=length(dd); 
     nmethod=2; %two methods 
     seednum=0; 

     BH = cell(nsim,ddlen,nmethod); %estimation of beta w.r.t two dimension cases and two methods 

     for di = 1:ddlen 
      d = dd(di); 
      for ni = 1:nsim 
       seednum = seednum + di*ni; 
       randn('seed', seednum); 
       betahat=randn(d,1); 
       for method = 1:nmethod 
        if method==1 
         BH{ni,di,method} = betahat; 
        else 
         BH{ni,di,method} = 10*betahat; 
        end 
       end 
      end 
     end 

然后我们就可以得到

BH(:,:,1) = 

    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 


BH(:,:,2) = 

    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 
    [4x1 double] [6x1 double] 

我想在10个行平均值(NSIM = 10),并得到类似

mean(BH(:,:,1))= 

    [4x1 double] [6x1 double] 

mean(BH(:,:,2)) = 

    [4x1 double] [6x1 double] 

有没有想法?谢谢!

+0

感谢@ EBH。但你的回答并不是我想要的。回报应该是两个向量,一个是[4x1 double],另一个是[6x1 double],换句话说,分别是10 [4x1 double]的平均值和10 [6x1 double]的平均值。 –

+3

如果所有数组的大小相同,为什么使用单元阵列? – beaker

+0

为什么你在每个循环中设置一个新的随机种子? – EBH

回答

0

我不知道这是否是最有效的方式不这样做,但你可以使用arrayfun为:

% generate random array 
BH = repmat({rand(4,1),rand(6,1)},[10 1 2]); 
% generate indexes for the 2nd and 3rd dimensions 
[n2,n1] = meshgrid(1:size(BH,2),1:size(BH,3)); 
% get mean across 1st (cell) dimension 
[res] = arrayfun(@(n1,n2)mean([BH{:,n1,n2}],2),n1(:),n2(:),'UniformOutput',false); 
% reshape to desired output 
res = reshape(res,[1 size(BH,2) size(BH,3)]); 

如果你想推广到N维单元阵列:

% generate random array 
BH = repmat({rand(4,1),rand(6,1)},[10,1,2,2,5]); 
sz = size(BH); 
% generate indexes for the 2nd and 3rd dimensions 
n = cell(1,numel(sz) - 1); 
[n{:}] = ndgrid(1:sz(2),1:sz(3),1:sz(4),1:sz(5)); 
n = cell2mat(cellfun(@(x) {x(:)},n)); 
idx = 1:size(n,1); 
% get mean across 1st (cell) dimension 
[res] = arrayfun(@(idx)mean([BH{:,n(idx,1),n(idx,2),n(idx,3),n(idx,4)}],2),... 
    idx,'UniformOutput',false); 
% reshape to desired output 
res = reshape(res,[1 sz(2:end)]); 
+0

它的工作原理!感谢@ user2999345。好吧,我觉得按照你的方法做一些泛化是有点复杂的。考虑一下'BH = repmat({rand(4,1),rand(6,1)},[10,1,2,2,5]);'即10次迭代,2个维数,2个噪声水平和5线性模型的候选方法y = X * beta + eps,其中eps〜N(0,\ sigma^2 * I)和\ sigma是噪音水平 –

+0

检查我的编辑,我希望它回答您的要求 – user2999345

+0

哇,你的新编辑答案太棒了!非常感谢! @ user2999345 –

0

如果我明白你的意思,你想对所有在向量中相同位置的元素取平均值。因此,从BH(:,1,1)中的所有向量中,我们获得4个平均值的一个向量,每个向量用于向量中的一个位置。这同样适用于BH(:,1,2)。对于BH(:,2,1)BH(:,2,1),我们做同样的事情,但在向量中有6个元素。

您可以使用下面的代码:

% split BH to 2 arrays: 
bh4 = reshape(cell2mat(BH(:,1,:)),[],nsim,2); % all the 4 elements vectors 
bh6 = reshape(cell2mat(BH(:,2,:)),[],nsim,2); % all the 6 elements vectors 
meanBH4 = squeeze(mean(bh4,2)); % mean over all 4 element vectors 
meanBH6 = squeeze(mean(bh6,2)); % mean over all 6 element vectors 

然而,一步一步在做的正确方法是定义两个数组,每个方法:

BH1 = zeros(nsim,ddlen,dd(1)); 
BH2 = zeros(nsim,ddlen,dd(2)); 

然后在你的循环中为它们赋值:

if method==1 
    BH1(ni,di,:) = betahat; 
else 
    BH2(ni,di,:) = 10*betahat; 
end 

并且最后只取其中的平均值:

meanBH1 = mean(BH1,3) 
meanBH2 = mean(BH1,3) 

编辑:

为了写这一切,更多的 'Matlabish' 的方式,我建议如下:

nsim = 10; % iteration number 
dd = [4 6]; % two dimension cases,\beta=(\beta_1,\cdots,\beta_d)^T 
methods = 2; % two methods 

% preapering random seeds 
s = bsxfun(@times,1:numel(dd),(1:nsim).'); 
seednum = cumsum(s(:)); 

% initialize results array 
BH = nan(max(dd),nsim,numel(dd),methods); 
counter = 1; 
for k = 1:numel(dd) 
    for n = 1:nsim 
     % set a new random seed from the list: 
     rng(seednum(counter)); 
     % compute all betahats with this seed: 
     betahat = randn(max(dd),2).*repmat([1 10],[max(dd) 1]); 
     % assign the values to BH by dimension: 
     for m = 1:methods 
      BH(1:dd(k),n,k,m) = betahat(1:dd(k),m); 
     end 
     counter = counter+1; 
    end 
end 
% compute the means over iterations: 
means = squeeze(mean(BH,2,'omitnan')) 

,并让你获得means为你的结果。


P.S.我不知道为什么你每次迭代,besides that's not a recommended syntax打电话randn('seed', seednum),但如果你能删除它,那么你可以向量化大部分的循环,你的代码挤压到:

% compute all betahats: 
betahat = randn(nsim,max(dd),numel(dd),2); 
% apply dimensions: 
for k = dd 
    betahat(:,k+1:end,1,:) = nan; 
end 
% apply methos 2: 
betahat(:,:,:,2) = betahat(:,:,:,2)*10; 

% compute the means over iterations: 
means = squeeze(mean(betahat,1,'omitnan')) 

希望事情看起来更清晰了。 ..

+0

Thanks @ EBH。但你的回答并不是我想要的。回报应该是两个向量,一个是[4x1 double],另一个是[6x1 double],换句话说,分别是10 [4x1 double]的平均值和10 [6x1 double]的平均值。 –

+0

@JohnStone你在问题中描述的是1 * 2 * 2输出,而不是4 * 1和6 * 1输出的向量。不过,请参阅我的编辑。 – EBH

+0

谢谢@ EBH。那么,如果我考虑一个更复杂的设置,比如'BH = repmat({rand(4,1),rand(6,1)},[10,1,2,2,5]);'即10次迭代,线性模型y = X * beta + eps,其中eps_N(0,\ sigma^2 * I)和\ sigma是噪声水平的2个参数,2个噪声水平和5个候选方法。我该做什么? –

0

另外,

% split into seperate cell arrays 
BH_1 = BH(:,:,1); 
BH_2 = BH(:,:,2); 

% create matrix of compatible vectors, and take mean and put result back into cell array 
BH_1_mean = cat(2,{mean(cell2mat(BH_1(:,1)'),2)}, {mean(cell2mat(BH_1(:,2)'),2)}); 
BH_2_mean = cat(2,{mean(cell2mat(BH_2(:,1)'),2)}, {mean(cell2mat(BH_2(:,2)'),2)}); 
+0

谢谢@ kedarps,它的工作原理! –