MPI C++矩阵加法，函数参数和函数返回

我在过去的2年中一直在从互联网上学习C++，最后需要我深入研究MPI。我一直在淘金的计算器和其他互联网（包括http://people.sc.fsu.edu/~jburkardt/cpp_src/mpi/mpi.html和https://computing.llnl.gov/tutorials/mpi/#LLNL）。我想我已经得到了一些逻辑的了，但我有一个很难包装我的头围绕以下几个：MPI C++矩阵加法，函数参数和函数返回

#include (stuff) 
using namespace std; 

vector<double> function(vector<double> &foo, const vector<double> &bar, int dim, int rows); 

int main(int argc, char** argv) 
{ 
    vector<double> result;//represents a regular 1D vector 
    int id_proc, tot_proc, root_proc = 0; 
    int dim;//set to number of "columns" in A and B below 
    int rows;//set to number of "rows" of A and B below 
    vector<double> A(dim*rows), B(dim*rows);//represent matrices as 1D vectors 

    MPI::Init(argc,argv); 
    id_proc = MPI::COMM_WORLD.Get_rank(); 
    tot_proc = MPI::COMM_WORLD.Get_size(); 

    /* 
    initialize A and B here on root_proc with RNG and Bcast to everyone else 
    */ 

    //allow all processors to call function() so they can each work on a portion of A 
    result = function(A,B,dim,rows); 

    //all processors do stuff with A 
    //root_proc does stuff with result (doesn't matter if other processors have updated result) 

    MPI::Finalize(); 
    return 0; 
} 

vector<double> function(vector<double> &foo, const vector<double> &bar, int dim, int rows) 
{ 
    /* 
    purpose of function() is two-fold: 
    1. update foo because all processors need the updated "matrix" 
    2. get the average of the "rows" of foo and return that to main (only root processor needs this) 
    */ 

    vector<double> output(dim,0); 

    //add matrices the way I would normally do it in serial 
    for (int i = 0; i < rows; i++) 
    { 
     for (int j = 0; j < dim; j++) 
     { 
      foo[i*dim + j] += bar[i*dim + j];//perform "matrix" addition (+= ON PURPOSE) 
     } 
    } 

    //obtain average of rows in foo in serial 
    for (int i = 0; i < rows; i++) 
    { 
     for (int j = 0; j < dim; j++) 
     { 
      output[j] += foo[i*dim + j];//sum rows of A 
     } 
    } 

    for (int j = 0; j < dim; j++) 
    { 
      output[j] /= rows;//divide to obtain average 
    } 

    return output;   
}

上面的代码是只说明这个概念。我主要关心的是并行化矩阵加法，但我脑海里浮现的是这样的：

1）如果每个处理器只工作在该循环的一部分（当然，我必须修改每个处理器的循环参数）什么命令我是否使用将A的所有部分合并回单个更新的A中，即所有处理器都在其内存中。我的猜测是，我必须做一些Alltoall，其中每个处理器将其部分A发送到所有其他处理器，但我如何保证（例如）处理器3处理的第3行覆盖其他处理器的第3行，而不是意外的第一行。

2）如果我使用Alltoall内部函数（），执行所有的处理器必须被允许踏入（）的函数，或者使用I可以隔离功能（）...

if (id_proc == root_proc) 
{ 
    result = function(A,B,dim,rows); 
}

...和然后在函数内部（）处理所有的并行。尽管听起来很愚蠢，但我试图在一个处理器上做很多工作（使用广播），并且将大量耗时的循环并行化。只是试图保持代码在概念上简单，这样我就可以得到我的结果并继续前进。

3）对于求平均值的部分，我确定如果我想对其进行并行化，我可以使用简化命令，对吗？

另外，作为一个方面：有没有办法调用Bcast（）以阻止它？我想用它来同步所有的处理器（boost库不是一个选项）。如果没有，那么我只需要使用Barrier（）。感谢您对这个问题的回答，以及在过去两年中学习如何编程的stackoverflow社区！ :)

来源

2013-04-08 Eric Inclan

1）您正在寻找的功能是MPI_Allgather。 MPI_Allgather将让您从每个处理器发送一行并在所有处理器上接收结果。

2）是的，你可以使用你的函数中的一些处理器。由于MPI功能与传播者一起工作，您必须为此创建一个单独的传播者。我不知道这是如何在C++绑定中实现的，但C绑定使用MPI_Comm_create函数。

3）是参见MPI_Allreduce。

撇开：Bcast会阻止一个进程，直到分配给该进程的发送/接收操作完成。如果你想等待所有的处理器完成他们的工作（我不知道你为什么要这样做），你应该使用Barrier（）。

额外说明：我不会推荐使用C++绑定，因为它们已折旧，您将无法找到有关如何使用它们的具体示例。如果您想要C++绑定，则Boost MPI是要使用的库，但它不包含所有MPI函数。

来源

2013-04-09 08:48:30 tunc

谢谢你的帮助！针对1）我收集的矢量将具有不同的长度，因为我不能保证矩阵将在所有处理器中均匀分配。我读过我必须使用MPI_Allgatherv。我会看看我能做什么。再次感谢！ – 2013-04-10 18:19:08

MPI C++矩阵加法，函数参数和函数返回

回答

相关问题