2017-05-05 38 views
0

我正在用C++编写一个带有MPI库的程序。只有一个节点发生死锁!我不使用发送或接收集体操作,但只使用两个集体功能(MPI_AllreduceMPI_Bcast)。 如果有节点等待其他节点发送或接收,我实际上并不明白是什么导致了这种死锁。具有集体功能的MPI死锁

void ParaStochSimulator::first_reacsimulator() { 
    SimulateSingleRun(); 
} 

double ParaStochSimulator::deterMinTau() { 
    //calcualte minimum tau for this process 
    l_nLocalMinTau = calc_tau(); //min tau for each node 
    MPI_Allreduce(&l_nLocalMinTau, &l_nGlobalMinTau, 1, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD);  
    //min tau for all nodes 
    //check if I have the min value 
    if (l_nLocalMinTau <= l_nGlobalMinTau && m_nCurrentTime < m_nOutputEndPoint) { 
     FireTransition(m_nMinTransPos); 
     CalculateAllHazardValues(); 
    } 
    return l_nGlobalMinTau; 
} 

void ParaStochSimulator::SimulateSingleRun() { 
    //prepare a run 
    PrepareRun(); 
    while ((m_nCurrentTime < m_nOutputEndPoint) && IsSimulationRunning()) { 
     deterMinTau(); 
     if (mnprocess_id == 0) { //master 
      SimulateSingleStep(); 
      std::cout << "current time:*****" << m_nCurrentTime << std::endl; 
      broad_casting(m_nMinTransPos); 
      MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
      //std::cout << "size of mani place :" << l_nMinplacesPos.size() << std::endl; 
     } 
    } 
    MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
    PostProcessRun(); 
} 

回答

1

当你的“主”进程正在执行MPI_Bcast,所有其他的仍在运行的循环,然后进入deterMinTau,然后执行MPI_Allreduce。

这是一个死锁,因为您的主节点正在等待所有节点执行Brodcast,并且所有其他节点正在等待主节点执行Reduce。

我相信你正在寻找的是:

void ParaStochSimulator::SimulateSingleRun() { 
    //prepare a run 
    PrepareRun(); 
    while ((m_nCurrentTime < m_nOutputEndPoint) && IsSimulationRunning()) { 
     //All the nodes reduce tau at the same time 
     deterMinTau(); 
     if (mnprocess_id == 0) { //master 
      SimulateSingleStep(); 
      std::cout << "current time:*****" << m_nCurrentTime << std::endl; 
      broad_casting(m_nMinTransPos); 
      //Removed bordcast for master here 
     } 
     //All the nodes broadcast at every loop iteration 
     MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
    } 
    PostProcessRun(); 
} 
+0

谢谢你的帮助,但不幸的是我已删除的广播形成主,仍有死锁-_- –