我想并行化那种循环。请注意,每个“calc_block”使用在前一次迭代中获得的数据。OpenMP并行化循环
for (i=0 ; i<MAX_ITER; i++){
norma1 = calc_block1();
norma2 = calc_block2();
norma3 = calc_block3();
norma4 = calc_block4();
norma = norma1+norma2+norma3+norma4;
...some calc...
if(norma<eps)break;
}
我tryed这一点,但加速是相当小〜1.2
for (i=0 ; i<MAX_ITER; i++){
#pragma omp parallel sections{
#pragma omp section
norma1 = calc_block1();
#pragma omp section
norma2 = calc_block2();
#pragma omp section
norma3 = calc_block3();
#pragma omp section
norma4 = calc_block4();
}
norma = norma1+norma2+norma3+norma4;
...some calc...
if(norma<eps)break;
}
我认为它的发生是因为使用循环内部分的开销。但我不知道如何修复它... 在此先感谢!
MAX_ITER的价值是什么?整个代码和每个块的绝对时间成本分别是多少? – kangshiyin