2014-03-18 25 views
0

我已经设置KMP_AFFINITY来分散,但执行时间增加了很多!Openmp。如何检索线程正在运行的核心ID

这就是为什么我认为OpenMP仅在1个内核上产生线程。

所以我需要一些东西 ,返回当前线程正在使用的内核。

这是我使用之前在for循环的编译:

int procs = omp_get_num_procs(); 
#pragma omp parallel for num_threads(procs)\ 
shared (c, u, v, w, k, j, i, nx, ny) \ 
reduction(+: a, b, c, d, e, f, g, h, i) 

而这些都是我做的出口:

export OMP_NUM_THREADS=5 
export KMP_AFFINITY=verbose,scatter 

如果有帮助,我也粘贴了详细:

OMP: Info #149: KMP_AFFINITY: Affinity capable, using global cpuid instr info 
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} 
OMP: Info #156: KMP_AFFINITY: 8 available OS procs 
OMP: Info #157: KMP_AFFINITY: Uniform topology 
OMP: Info #159: KMP_AFFINITY: 2 packages x 4 cores/pkg x 1 threads/core (8 total cores) 
OMP: Info #160: KMP_AFFINITY: OS proc to physical thread map ([] => level not in map): 
OMP: Info #168: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 [thread 0] 
OMP: Info #168: KMP_AFFINITY: OS proc 4 maps to package 0 core 1 [thread 0] 
OMP: Info #168: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 [thread 0] 
OMP: Info #168: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 [thread 0] 
OMP: Info #168: KMP_AFFINITY: OS proc 1 maps to package 1 core 0 [thread 0] 
OMP: Info #168: KMP_AFFINITY: OS proc 5 maps to package 1 core 1 [thread 0] 
OMP: Info #168: KMP_AFFINITY: OS proc 3 maps to package 1 core 2 [thread 0] 
OMP: Info #168: KMP_AFFINITY: OS proc 7 maps to package 1 core 3 [thread 0] 
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0} 
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1} 
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4} 
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {5} 
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {2} 
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {3} 
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {6} 
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {7} 

在此先感谢!

+0

变量是默认共享的。您没有任何“私人”条款,因此您认为许多变量是私有的可能实际上是共享的。数据竞争和错误共享可能会大大降低程序的性能,并让您认为所有线程都运行在单个内核上。 –

+0

您展示的详细列表似乎并不符合您声称的运行,因为它显示了八个OpenMP线程,您可以看到每个线程都绑定到一个单独的逻辑CPU,而您声称使用五个线程。 (所以它肯定*是*使用所有硬件)。你没有说基本情况是什么,只是分散速度比......某些东西...在你的机器中,有可能四个线程全部在一个套接字中,比起两个套接字中的四个线程,的数据共享。 –

+0

p.s.如果您不相信运行时的输出显示它正在执行的操作,并假设您在Linux上,则只需运行xosview并在运行代码时查看每个逻辑CPU上的负载。 –

回答

1

如果你在linux上,你可以使用函数sched_getcpu()。这里是一个链接来解释它是如何工作和它的声明:

http://man7.org/linux/man-pages/man3/sched_getcpu.3.html

希望这可以帮助

+0

)你好,我已经尝试过使用这个函数,但是它报告了未定义的函数sched_getcpu。我认为这是因为我正在使用英特尔的编译器 – CrashLaker

+0

@CrashLaker,这是因为你可能忘了在你的代码的顶部添加'#include ' –

+0

我没有忘记包含。我猜这个库不存在。但是,如果是这种情况,c编译器不应该提醒我这么做吗? – CrashLaker

2

正如@ user3018144 pointed outsched_getcpu()就是可以用来获取CPU编号。

考虑下面的代码:

#include <stdio.h> 
#include <sched.h> 
#include <omp.h> 

int main() { 
#pragma omp parallel 
    { 
     int thread_num = omp_get_thread_num(); 
     int cpu_num = sched_getcpu(); 
     printf("Thread %3d is running on CPU %3d\n", thread_num, cpu_num); 
    } 

    return 0; 
} 

这是我没有亲和力输出:

$> OMP_NUM_THREADS=4 ./a.out | sort 
Thread 0 is running on CPU 2 
Thread 1 is running on CPU 0 
Thread 2 is running on CPU 3 
Thread 3 is running on CPU 1 

这是具有亲和力的输出:并行区域之前宣布

$> GOMP_CPU_AFFINITY='0,1,2,3' OMP_NUM_THREADS=4 ./a.out | sort 
Thread 0 is running on CPU 0 
Thread 1 is running on CPU 1 
Thread 2 is running on CPU 2 
Thread 3 is running on CPU 3