2012-11-16 46 views
0

当我试图在程序中调用多个MPI_Send或MPI_Recv时,可执行文件在节点和根中被挂起。即当它试图执行第二个MPI_Send或MPI_Recv时,通信就会被阻塞。与此同时,二进制文件在机器中运行率达到100%。OpenMPI多个MPI_Send和MPI_recv不起作用

当我试图在OpenMPI 1.6.3 64位的Windows 7 64位中运行此代码时,它运行成功。但是相同的代码在Linux中不起作用,即,使用OpenMPI 1.6.3 -64位的CentOS 6.3 x86_64。我做了什么问题。

发布以下

#include <mpi.h> 

int main(int argc, char** argv) { 
MPI::Init(); 
int rank = MPI::COMM_WORLD.Get_rank(); 
int size = MPI::COMM_WORLD.Get_size(); 
char name[256] = { }; 
int len = 0; 
MPI::Get_processor_name(name, len); 

printf("Hi I'm %s:%d\n", name, rank); 

if (rank == 0) 
{ 
    while (size >= 1) 
    { 
     int val, stat = 1; 
     MPI::Status status; 
     MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, 1, 0, status); 
     int source = status.Get_source(); 
     printf("%s:%d received %d from %d\n", name, rank, val, source); 

     MPI::COMM_WORLD.Send(&stat, 1, MPI::INT, 1, 2); 
     printf("%s:%d sent status %d\n", name, rank, stat); 

     size--; 
    } 
} else 
{ 
    int val = rank + 10; 
    int stat = 0; 
    printf("%s:%d sending %d...\n", name, rank, val); 
    MPI::COMM_WORLD.Send(&val, 1, MPI::INT, 0, 0); 
    printf("%s:%d sent %d\n", name, rank, val); 

    MPI::Status status; 
    MPI::COMM_WORLD.Recv(&stat, 1, MPI::INT, 0, 2, status); 
    int source = status.Get_source(); 
    printf("%s:%d received status %d from %d\n", name, rank, stat, source); 
} 

size = MPI::COMM_WORLD.Get_size(); 
if (rank == 0) 
{ 
    while (size >= 1) 
    { 
     int val, stat = 1; 
     MPI::Status status; 

     MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, 1, 1, status); 
     int source = status.Get_source(); 
     printf("%s:0 received %d from %d\n", name, val, source); 

     size--; 
    } 

    printf("all workers checked in!\n"); 
} 
else 
{ 
    int val = rank + 10 + 5; 
    printf("%s:%d sending %d...\n", name, rank, val); 
    MPI::COMM_WORLD.Send(&val, 1, MPI::INT, 0, 1); 
    printf("%s:%d sent %d\n", name, rank, val); 
} 
MPI::Finalize(); 

return 0; 

代码}

嗨斯托伊奇,我已经改变了源如你所说,代码再次发布

#include <mpi.h> 
#include <stdio.h> 

int main(int argc, char** argv) 
{ 
    int iNumProcess = 0, iRank = 0, iNameLen = 0, n; 
    char szNodeName[MPI_MAX_PROCESSOR_NAME] = {}; 
    MPI_Status stMPIStatus; 

    MPI_Init(&argc, &argv); 
    MPI_Comm_size(MPI_COMM_WORLD, &iNumProcess); 
    MPI_Comm_rank(MPI_COMM_WORLD, &iRank); 
    MPI_Get_processor_name(szNodeName, &iNameLen); 

    printf("Hi I'm %s:%d\n", szNodeName, iRank); 

    if (iRank == 0) 
    { 
     int iNode = 1; 
     while (iNumProcess > 1) 
     { 
      int iVal = 0, iStat = 1; 
      MPI_Recv(&iVal, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &stMPIStatus); 
      printf("%s:%d received %d\n", szNodeName, iRank, iVal); 

      MPI_Send(&iStat, 1, MPI_INT, iNode, 1, MPI_COMM_WORLD); 
      printf("%s:%d sent Status %d\n", szNodeName, iRank, iStat); 

      MPI_Recv(&iVal, 1, MPI_INT, MPI_ANY_SOURCE, 2, MPI_COMM_WORLD, &stMPIStatus); 
      printf("%s:%d received %d\n", szNodeName, iRank, iVal); 

      iNumProcess--; 
      iNode++; 
     } 

     printf("all workers checked in!\n"); 
    } 
    else 
    { 
     int iVal = iRank + 10; 
     int iStat = 0; 
     printf("%s:%d sending %d...\n", szNodeName, iRank, iVal); 
     MPI_Send(&iVal, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); 
     printf("%s:%d sent %d\n", szNodeName, iRank, iVal); 

     MPI_Recv(&iStat, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &stMPIStatus); 
     printf("%s:%d received status %d\n", szNodeName, iRank, iVal); 

     iVal = 20; 
     printf("%s:%d sending %d...\n", szNodeName, iRank, iVal); 
     MPI_Send(&iVal, 1, MPI_INT, 0, 2, MPI_COMM_WORLD); 
     printf("%s:%d sent %d\n", szNodeName, iRank, iVal); 

    } 

    MPI_Finalize(); 

    return 0; 
} 

我得到了输出folows 。即在发送发送/接收之后,根无限期地等待,并且节点以100%的CPU使用率变化。其输出如下

Hi I'm N1433:1 
N1433:1 sending 11... 
Hi I'm N1425:0 
N1425:0 received 11 
N1425:0 sent Status 1 
N1433:1 sent 11 
N1433:1 received status 11 
N1433:1 sending 20... 

这里N1433和N1425是机器名称。请帮忙

回答

2

主人的代码是错误的。它总是发送并等待来自同一等级的消息 - 等级1。因此,如果以mpiexec -np 2 ...运行,程序只能正常运行。你可能想要做的是使用MPI_ANY_SOURCE作为源排名,然后在发送操作中使用该源排名作为目的地。您也不应该使用while (size >= 1),因为排名0不会与自己通话,通信数量预计会比size小1。

if (rank == 0) 
{ 
    while (size > 1) 
    //  ^^^^^^^^ 
    { 
     int val, stat = 1; 
     MPI::Status status; 
     MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, MPI_ANY_SOURCE, 0, status); 
     // Use wildcard source here ------------^^^^^^^^^^^^^^ 
     int source = status.Get_source(); 
     printf("%s:%d received %d from %d\n", name, rank, val, source); 

     MPI::COMM_WORLD.Send(&stat, 1, MPI::INT, source, 2); 
     // Send back to the same process --------^^^^^^ 
     printf("%s:%d sent status %d\n", name, rank, stat); 

     size--; 
    } 
} else 

做这样的事情的工人是没有意义的:

MPI::Status status; 
MPI::COMM_WORLD.Recv(&stat, 1, MPI::INT, 0, 2, status); 
// Source rank is fixed here ------------^ 
int source = status.Get_source(); 
printf("%s:%d received status %d from %d\n", name, rank, stat, source); 

您已经指定等级0作为接收操作源,因此只能够接收来自排名信息0status.Get_source()将不会返回除0以外的任何值,除非发生了某种通信错误,在这种情况下,MPI::COMM_WORLD.Recv()将引发异常。

对于代码中的第二个循环也是如此。

顺便说一句,你正在使用什么曾经是官方的标准C++绑定。它们在MPI-2.2中被弃用,并且最新版本的标准(MPI-3.0)完全移除了它们,不再受MPI论坛支持。您应该使用C绑定,或者使用第三方C++接口,如Boost.MPI

+0

嗨Hristo,我已更改来源并在下面发帖。 – Sijo

1

安装和MPICH2而不是OpenMPI后,它工作成功。我认为在我的集群机器中使用OpenMPI 1.6.3存在一些问题。