2013-03-19 47 views
0

我是MPI的新手,我试图编写Fox的算法(AxB = C,其中A和B是维度为nxn的矩阵)的实现。我的程序工作正常,但我想看看是否可以通过在矩阵B中的块移位期间重叠通信来重新加速通信,以计算产品矩阵(B的块矩阵在算法)。根据算法,2D笛卡尔网格中的每个进程都有来自矩阵A,B和C的块。我现在有是这样的,这是福克斯的算法MPI Fox的算法非阻塞发送和接收

if (stage > 0){ 


    //shifting b values in all proccess 

    MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm); 
    MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1); 
    MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2);       
    MPI_Wait(&my_request1, &status); 
    MPI_Wait(&my_request2, &status); 
    multiplyMatrix(a_temp,b,c,n_local); 
} 

子矩阵a_temp,B内,b_temp是double类型是指向块的指针N/NUMPROCESS * N/numprocesses(这是块的大小矩阵例如b =(double *)calloc(n/numprocess * n/numprocesses,sizeof(double)))。

我想在MPI_Wait调用之前有multiplyMatrix函数(这会构成通信和计算的重叠),但我不知道该怎么做。我需要两个独立的缓冲区并在不同阶段交替使用它们吗?

(我知道我可以使用MPI_Sendrecv_replace但这并不具有重叠的,因为它使用阻塞发送和接收的帮助。这同样适用于MPI_Sendrecv)

回答

0

我居然想出如何做到这一点。这个问题应该可能被删除。但是因为我是MPI的新手,我会在这里发布这些解决方案,如果有人有改进建议,我会很高兴,如果他们分享。方法1:

// Fox's algorithm 
double * b_buffers[2]; 
b_buffers[0] = (double *) malloc(n_local*n_local*sizeof(double)); 
b_buffers[1] = b; 
for (stage =0;stage < q; stage++){ 
     //copying a into a_temp and Broadcasting a_temp of each proccess to all other proccess in its row 
     for (i=0;i< n_local*n_local; i++) 
      a_temp[i]=a[i]; 
     if (stage == 0) { 
      MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm); 
      multiplyMatrix(a_temp,b,c,n_local); 
      MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);  
      MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2); 
      MPI_Wait(&my_request2, &status); 
      MPI_Wait(&my_request1, &status); 
     } 


     if (stage > 0) 
     {   
      //shifting b values in all procces 
      MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm); 
      MPI_Isend(b_buffers[(stage)%2], n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);  
      MPI_Irecv(b_buffers[(stage+1)%2], n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2); 
       multiplyMatrix(a_temp, b_buffers[(stage)%2], c, n_local);   
      MPI_Wait(&my_request2, &status); 
      MPI_Wait(&my_request1, &status); 

    }  
}  

方法2:

// Fox's algorithm 

for (stage =0;stage < q; stage++){ 
     //copying a into a_temp and Broadcasting a_temp of each proccess to all other proccess in its row 
     for (i=0;i< n_local*n_local; i++) 
      a_temp[i]=a[i]; 
     if (stage == 0) { 
      MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm); 
      multiplyMatrix(a_temp,b,c,n_local); 
      MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);  
      MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2); 
      MPI_Wait(&my_request2, &status); 
      MPI_Wait(&my_request1, &status); 
     } 


     if (stage > 0) 
     {   
      //shifting b values in all proccess 
      memcpy(b_temp, b, n_local*n_local*sizeof(double)); 
       MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm); 
      MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1); 
       MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2); 
       multiplyMatrix(a_temp, b_temp, c, n_local);   
       MPI_Wait(&my_request2, &status); 
       MPI_Wait(&my_request1, &status); 

    } 

这两个似乎工作,但我说我是新来的MPI,如果您有任何意见或建议,请分享。

+0

如果你不使用'status',那么你可以在一行中使用MPI_STATUS_IGNORE' – 2017-08-06 10:22:52

+0

而不是2'MPI_Wait()',你可以使用一个请求数组,并且可以使用'MPI_Waitall()' 'MPI_STATUSES_IGNORE',如果你不关心状态。 – 2017-08-06 10:24:09