OpenMP，for循环里面部分

我想运行下面的代码（下面）。我想产生两个独立的线程，每个线程都会运行一个并行for循环。不幸的是，我得到一个错误。显然，在section内不能生成并行for。如何解决这个问题？OpenMP，for循环里面部分

#include <omp.h> 
#include "stdio.h" 

int main() 
{ 

omp_set_num_threads(10); 

#pragma omp parallel  
#pragma omp sections 
    { 
#pragma omp section 
#pragma omp for 
    for(int i=0; i<5; i++) { 
     printf("x %d\n", i); 
    } 

#pragma omp section 
#pragma omp for 
    for(int i=0; i<5; i++) { 
     printf(". %d\n", i); 
    } 
    } // end parallel and end sections 
}

和错误：

main.cpp: In function ‘int main()’: 
main.cpp:14:9: warning: work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region [enabled by default] 
main.cpp:20:9: warning: work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region [enabled by default]

来源

2011-10-27 Jakub M.

OpenMP的不能创建并行区域内并行区域。这是由于OpenMP在程序的开始处创建了num_threads并行线程，而在非并行区域中其他未被使用和休眠。他们已经这样做了，因为与唤醒睡眠线程相比，频繁生成的新线程相当慢。

因此，你应该只并行化循环：

#include <omp.h> 
#include "stdio.h" 

int main() 
{ 

omp_set_num_threads(10); 

#pragma omp parallel for 
    for(int i=0; i<5; i++) { 
     printf("x %d\n", i); 
    } 

#pragma omp parallel for 
    for(int i=0; i<5; i++) { 
     printf(". %d\n", i); 
    } 
}

来源

2011-10-27 13:36:41 tune2fs

但是那么你应该可能omp_set_num_threads（）为5或更少... –

你在这种情况下的权利应该设置为5，但它应该是没有问题，如果它是10，因为其他5什么都不做。 – tune2fs

我同意，它不会破坏任何东西。 –

在这里，你必须使用嵌套并行性。 sections中omp for的问题在于，范围内的所有线程都必须参与omp for，并且它们显然不是—，它们按部分分解。所以你必须引入函数，并在函数内部嵌套并行。

#include <stdio.h> 
#include <omp.h> 

void doTask1(const int gtid) { 
    omp_set_num_threads(5); 
#pragma omp parallel 
    { 
     int tid = omp_get_thread_num(); 
     #pragma omp for 
     for(int i=0; i<5; i++) { 
      printf("x %d %d %d\n", i, tid, gtid); 
     } 
    } 
} 

void doTask2(const int gtid) { 
    omp_set_num_threads(5); 
#pragma omp parallel 
    { 
     int tid = omp_get_thread_num(); 
     #pragma omp for 
     for(int i=0; i<5; i++) { 
      printf(". %d %d %d\n", i, tid, gtid); 
     } 
    } 
} 


int main() 
{ 
    omp_set_num_threads(2); 
    omp_set_nested(1); 

#pragma omp parallel  
    { 
     int gtid = omp_get_thread_num(); 
#pragma omp sections 
     { 
#pragma omp section 
      doTask1(gtid); 

#pragma omp section 
      doTask2(gtid); 
     } // end parallel and end sections 
    } 
}

来源

2011-10-27 13:53:25

我觉得我应该说线程数的硬编码（'omp_set_num_thread（）'），或嵌套在代码中启用并行操作（'omp_set_nested（）'）处于非最佳实践和某种讨厌之间;您通常希望用户能够使用环境变量进行设置。它们仅在教程中明确设置。 –

实际上，最佳线程数等于可用CPU核的数量。因此，每个并行应在所有可用的内核中处理，这在omp部分内是不可能的。所以，你试图达到的目标并不是最优的。 tune2fs的建议执行两个没有节的循环是有意义的，并提供最佳的性能。您可以在另一个函数内部执行并行循环，但这种“作弊”并不会提升性能。

来源

2011-10-27 13:56:08

他完全有可能完成10个昂贵的独立任务，包括2个5迭代循环和至少10个核心。在这种情况下，以这种方式进行分解非常合理，因为在任一循环中都不能使用5个以上的内核。 –

@Jonathan Dursi - 在这种情况下，你的建议是可以的。我主要想到长循环 - 就像在OpenMP专业化的图像处理中一样。 –

OpenMP，for循环里面部分

回答

相关问题