2013-03-06 44 views
1

我正尝试在使用openMP的单独CPU上运行两个进程。在这种情况下,每个CPU都有6个带超线程的核心(所以有12个硬件线程)。他们需要做一些同步,如果他们知道彼此的PID,看起来更容易一些。所以我正在从sigS开始一个sigC的过程,使用fork()execve()GOMP_CPU_AFFINITY环境变量调用了一个不同的值。在fork()/execve()电话后,sigS具有正确的亲和力,但仍然打印sigC无法在分叉进程中设置OpenMP线程关联

libgomp: no cpus left for affinity setting 

和所有线程都在相同的核心。

sigS代码:

#define _GNU_SOURCE 
#include <stdio.h> 
#include <unistd.h> 
#include <errno.h> 
#include <omp.h> 
#include <sched.h> 

int main(void) 
{ 
    omp_set_num_threads(12); //12 hardware threads per CPU 
    //this loop runs as expected 
    #pragma omp parallel for 
    for(int i = 0; i<12; i++) { 
     #pragma omp critical 
     { 
     printf("TEST PRE-FORK: I am thread %2d running on core %d\n", 
       omp_get_thread_num(), sched_getcpu()); 
     } 
    } 

    pid_t childpid = fork(); 

    if(childpid < 0) { 
     perror("Fork failed"); 
    } else { 
     if(childpid == 0) { //<------ attempt to set affinity for child 
     //change the affinity for the other process so it runs 
     //on the other cpu 
     char ompEnv[] = "GOMP_CPU_AFFINITY=6-11 18-23"; 
     char * const args[] = { "./sigC", (char*)0 }; 
     char * const envArgs[] = { ompEnv, (char*)0 }; 
     execve(args[0], args, envArgs); 
     perror("Returned from execve"); 
     exit(1); 
     } else { 
     omp_set_num_threads(12); 
     printf("PARENT: my pid  = %d\n", getpid()); 
     printf("PARENT: child pid = %d\n", childpid); 
     sleep(5); //sleep for a bit so child process prints first 

     //This loop gives the same thread core/pairings as above 
     //this is expected 
     #pragma omp parallel for 
     for(int i = 0; i < 12; i++) { 
      #pragma omp critical 
      { 
       printf("PARENT: I'm thread %2d, on core %d.\n", 
         omp_get_thread_num(), sched_getcpu()); 
      } 
     } 
     } 
    } 
    return 0; 
} 

sigC的代码只是有一个OMP平行for循环中,但为了完整性:

#define _GNU_SOURCE 
#include <stdio.h> 
#include <unistd.h> 
#include <errno.h> 
#include <omp.h> 
#include <sched.h> 

int main(void) 
{ 
    omp_set_num_threads(12); 
    printf("CHILD: my pid  = %d\n", getpid()); 
    printf("CHILD: parent pid = %d\n", getppid()); 
    //I expect this loop to have the core pairings as I specified in execve 
    //i.e thread 0 -> core 6, 1 -> 7, ... 6 -> 18, 7 -> 19 ... 11 -> 23 
    #pragma omp parallel for 
    for(int i = 0; i < 12; i++) { 
     #pragma omp critical 
     { 
     printf("CHILD: I'm thread %2d, on core %d.\n", 
       omp_get_thread_num(), sched_getcpu()); 
     } 
    } 
    return 0; 
} 

输出:

$ env GOMP_CPU_AFFINITY="0-5 12-17" ./sigS 

这部分如预期

TEST PRE-FORK: I'm thread 0, on core 0. 
TEST PRE-FORK: I'm thread 11, on core 17. 
TEST PRE-FORK: I'm thread 5, on core 5. 
TEST PRE-FORK: I'm thread 6, on core 12. 
TEST PRE-FORK: I'm thread 3, on core 3. 
TEST PRE-FORK: I'm thread 1, on core 1. 
TEST PRE-FORK: I'm thread 8, on core 14. 
TEST PRE-FORK: I'm thread 10, on core 16. 
TEST PRE-FORK: I'm thread 7, on core 13. 
TEST PRE-FORK: I'm thread 2, on core 2. 
TEST PRE-FORK: I'm thread 4, on core 4. 
TEST PRE-FORK: I'm thread 9, on core 15. 
PARENT: my pid  = 11009 
PARENT: child pid = 11021 

这就是问题 - 在孩子的所有线程核心0

libgomp: no CPUs left for affinity setting 
CHILD: my pid  = 11021 
CHILD: parent pid = 11009 
CHILD: I'm thread 1, on core 0. 
CHILD: I'm thread 0, on core 0. 
CHILD: I'm thread 4, on core 0. 
CHILD: I'm thread 5, on core 0. 
CHILD: I'm thread 6, on core 0. 
CHILD: I'm thread 7, on core 0. 
CHILD: I'm thread 8, on core 0. 
CHILD: I'm thread 9, on core 0. 
CHILD: I'm thread 10, on core 0. 
CHILD: I'm thread 11, on core 0. 
CHILD: I'm thread 3, on core 0. 

运行(我省略了父线程印刷,因为它是一样的预分叉)

任何关于如何解决这个问题或者如果这是正确的方法的想法?

+0

请经常检查,如果你的代码示例编译在这里张贴他们的编译错误可能阻挠那些谁尝试之前解决你的问题。在这两个源代码中,'errno.h'中都有一个逗号而不是点,而'sigS'的源代码中有一个拼写错误('/'而不是'//')。 – 2013-03-06 21:27:48

回答

3

fork() -ed子进程继承其父亲关联掩码。 libgomp将该亲和度掩码与来自GOMP_CPU_AFFINITY的集合相交,并且以两个集合为互补的空集合结束。这种行为没有记录,但看看libgomp的源代码证实确实如此。

的解决方案是重置子进程的关联掩码它使execve()呼叫前:

if (childpid == 0) { //<------ attempt to set affinity for child 
    cpu_set_t *mask; 
    size_t size; 
    int nrcpus = 256; // 256 CPUs should be more than enough 

    // Reset the CPU affinity mask 
    mask = CPU_ALLOC(nrcpus); 
    size = CPU_ALLOC_SIZE(nrcpus); 
    for (int i = 0; i < nrcpus; i++) 
     CPU_SET_S(i, size, mask); 
    if (sched_setaffinity(0, size, mask) == -1) { handle error } 
    CPU_FREE(mask); 

    //change the affinity for the other process so it runs 
    //on the other cpu 
    char ompEnv[] ="GOMP_CPU_AFFINITY=6-11 18-23"; 
    char * const args[] = {"./sigC", (char*)0}; 
    char * const envArgs[] = {ompEnv, (char*)0}; 
    execve(args[0], args, envArgs); 
    perror("Returned from execve"); 
    exit(1); 
} else {