2012-12-12 57 views
1

我做了一个生命游戏的顺序版本,但现在我需要使用OpenMP创建一个并行版本的代码,但我遇到了一些问题。 如果有人能帮助我,那将会非常好。 THKS。 这里是我的顺序代码:OpenMP的生命游戏

// Swapping the two grids 
#define SWAP_BOARDS(b1, b2) do { \ 
char* temp = b1; \ 
b1 = b2; \ 
b2 = temp; \ 
} while(0) 

// Simplifying access to grid elements 
    #define BOARD(G, X, Y) ((G)[NC*(X)+(Y)]) 

char* sequential_game_of_life (char* outgrid, char* ingrid, 
     const int nrows, const int ncols, const int gens_max) { 

    const int NC = ncols; 
    int curgen, i, j; 

for (curgen = 0; curgen < gens_max; curgen++) 
    { 

    for (i = 0; i < nrows; i++) 
{ 
    for (j = 0; j < ncols; j++) 
    { 
     const int inorth = mod (i-1, nrows); 
     const int isouth = mod (i+1, nrows); 
     const int jwest = mod (j-1, ncols); 
     const int jeast = mod (j+1, ncols); 

     const char neighbor_count = 
    BOARD (ingrid, inorth, jwest) + 
    BOARD (ingrid, inorth, j) + 
    BOARD (ingrid, inorth, jeast) + 
    BOARD (ingrid, i, jwest) + 
    BOARD (ingrid, i, jeast) + 
    BOARD (ingrid, isouth, jwest) + 
    BOARD (ingrid, isouth, j) + 
    BOARD (ingrid, isouth, jeast); 

     BOARD(outgrid, i, j) = alivep (neighbor_count, BOARD (ingrid, i, j)); 
    } 
} 
    SWAP_BOARDS(outgrid, ingrid); 
} 
    return outgrid; 
} 

我知道,我必须为平行的那些3,但我不能看到如何做到这一点。

+0

那么问题是什么?是不是像“我怎样才能让3'for'循环与OpenMP并行运行?”或类似的东西? – netcoder

回答

4

我觉得外环不能并行,因为输入到每一代前代,所以它有一个顺序式(至少你不能用细微的变化做到这一点!)

在的情况下,嵌套循环遍历矩阵或类似的东西,我喜欢从0ncol*nrow(在你的情况)运行一个循环,并从循环索引中找到ij

这样的:

// because you are running a parallel codes multiple times in a loop, 
// it would be better to make the thread swarm first and schedule the 
// tasks in each loop iteration, to avoid multiple creation and destruction 
// of working threads 
#pragma omp parallel 
for (curgen = 0; curgen < gens_max; curgen++) 
{ 
    #pragma omp for 
    for (t = 0; t < nrows*ncols; t++) 
    { 
     int i = t/ncols; 
     int j = t % ncols; 
     const int inorth = mod (i-1, nrows); 
     const int isouth = mod (i+1, nrows); 
     const int jwest = mod (j-1, ncols); 
     const int jeast = mod (j+1, ncols); 

     const char neighbor_count = 
      BOARD (ingrid, inorth, jwest) + 
      BOARD (ingrid, inorth, j) + 
      BOARD (ingrid, inorth, jeast) + 
      BOARD (ingrid, i, jwest) + 
      BOARD (ingrid, i, jeast) + 
      BOARD (ingrid, isouth, jwest) + 
      BOARD (ingrid, isouth, j) + 
      BOARD (ingrid, isouth, jeast); 

     BOARD(outgrid, i, j) = alivep (neighbor_count, BOARD (ingrid, i, j)); 
    } 
    SWAP_BOARDS(outgrid, ingrid); 
} 

我跑在我的笔记本电脑上的1000×1000矩阵双核2.53 GHz的CPU超过1000代这个代码,并获得69%的速度了。

+0

非常好,但我在这里有一些疑问,实际上是两个: 第一:我不得不定义一个类似#pragma omp parallel num_threads(NTHREADS)private(t)的并行区域吗? 第二:当我用你的建议运行算法时,使用1个线程比使用4更快。任何猜测为什么会发生这种情况? – tsukanomon

+1

'#pragma omp parallel for'是'#pragma omp parallel'的快捷方式,然后是'#pragma omp for'。您也可以在其中指定其他选项,如:'#pragma omp parallel for num_threads(N)'。为加快速度,请检查更新的答案 – saeedn

+1

您必须给予'curgen'私人待遇。当有'collapse(n)'OpenMP指令时,手动循环崩溃是多余的。 –