2017-07-14 35 views
0

我有一个内核源代码,可以在我的PC上的G970上运行,但不会在我的2015年初MacBook Pro上使用Iris 6100 1536MB图形进行编译。pyopenCL,openCL,无法在GPU上构建程序

platform = cl.get_platforms()[0] 
device = platform.get_devices()[1] # Get the GPU ID 
ctx  = cl.Context([device])  # Tell CL to use GPU 
queue = cl.CommandQueue(ctx)  # Create a command queue for the target device. 
# program = cl.Program(ctx, kernelsource).build() 
print platform.get_devices() 

这对 '苹果' 在为0xffffffff> get_devices()显示我的 '英特尔(R)酷睿(TM)i5-5287U CPU @ 2.90GHz',“英特尔(R)光圈(TM)显卡6100 '在''苹果'在0x1024500。

内核将在CPU上正确运行。但是当我在GPU上构建程序时。它返回:

--------------------------------------------------------------------------- 
RuntimeError        Traceback (most recent call last) 
<ipython-input-44-e2b6e1b931de> in <module>() 
     3 ctx  = cl.Context([device])  # Tell CL to use GPU 
     4 queue = cl.CommandQueue(ctx)  # Create a command queue for the target device. 
----> 5 program = cl.Program(ctx, kernelsource).build() 
     6 
     7 

/usr/local/lib/python2.7/site-packages/pyopencl-2015.2.4-py2.7-macosx-10.11-x86_64.egg/pyopencl/__init__.pyc in build(self, options, devices, cache_dir) 
    393       self._context, self._source, options, devices, 
    394       cache_dir=cache_dir), 
--> 395      options=options, source=self._source) 
    396 
    397    del self._context 

/usr/local/lib/python2.7/site-packages/pyopencl-2015.2.4-py2.7-macosx-10.11-x86_64.egg/pyopencl/__init__.pyc in _build_and_catch_errors(self, build_func, options, source) 
    428   # Python 3.2 outputs the whole list of currently active exceptions 
    429   # This serves to remove one (redundant) level from that nesting. 
--> 430   raise err 
    431 
    432  # }}} 

RuntimeError: clbuildprogram failed: BUILD_PROGRAM_FAILURE - 

Build on <pyopencl.Device 'Intel(R) Iris(TM) Graphics 6100' on 'Apple' at 0x1024500>: 

Cannot select: 0x7f94b30a5110: i64,ch = dynamic_stackalloc 0x7f94b152a290, 0x7f94b30a4f10, 0x7f94b3092c10 [ORD=7] [ID=54] 
    0x7f94b30a4f10: i64 = and 0x7f94b30a4c10, 0x7f94b3092b10 [ORD=7] [ID=52] 
    0x7f94b30a4c10: i64 = add 0x7f94b30a6610, 0x7f94b3092a10 [ORD=7] [ID=49] 
     0x7f94b30a6610: i64 = shl 0x7f94b3092d10, 0x7f94b3092e10 [ID=46] 
     0x7f94b3092d10: i64 = bitcast 0x7f94b30a4810 [ID=41] 
      0x7f94b30a4810: v2i32 = IGILISD::MOVSWZ 0x7f94b3092710, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=32] 
      0x7f94b3092710: i32,ch = CopyFromReg 0x7f94b152a290, 0x7f94b3092610 [ORD=5] [ID=22] 
       0x7f94b3092610: i32 = Register %vreg60 [ORD=5] [ID=1] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b3092e10: i64 = bitcast 0x7f94b30a3f10 [ID=38] 
      0x7f94b30a3f10: v2i32 = IGILISD::MOVSWZ 0x7f94b30a4510, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=29] 
      0x7f94b30a4510: i32 = Constant<2> [ID=19] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b3092a10: i64 = bitcast 0x7f94b30a4b10 [ID=40] 
     0x7f94b30a4b10: v2i32 = IGILISD::MOVSWZ 0x7f94b30a4e10, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=31] 
      0x7f94b30a4e10: i32 = Constant<7> [ID=21] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
    0x7f94b3092b10: i64 = bitcast 0x7f94b3092910 [ID=39] 
     0x7f94b3092910: v2i32 = IGILISD::MOVSWZ 0x7f94b30a5010, 0x7f94b30a4210, 0x7f94b30a2810, 0x7f94b30a2810 [ID=30] 
     0x7f94b30a5010: i32 = Constant<-8> [ID=20] 
     0x7f94b30a4210: i32 = Constant<-1> [ORD=3] [ID=10] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
    0x7f94b3092c10: i64 = bitcast 0x7f94b3092810 [ID=35] 
    0x7f94b3092810: v2i32 = IGILISD::MOVSWZ 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=27] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
In function: trajectories 
(options: -I /usr/local/lib/python2.7/site-packages/pyopencl-2015.2.4-py2.7-macosx-10.11-x86_64.egg/pyopencl/cl) 
(source saved as /var/folders/p2/jd7m10gs5k1_q6hx5kvktkcc0000gn/T/tmpWQmCKr.cl) 

任何建议为什么这不会运行? 我正在运行2015年初MacBook Pro,Sierra 10.12.5。 打印cl.version.VERSION回到2015年2月4日

以下是内核代码:

kernelsource = """ 
__kernel void trajectories(
    // TODO: adjust argtypes above if this is changed 
    const int N, 
    const int dim, 
    __constant float* data, 
    const int nrParticles, 
    __global float* pos, 
    __global float* vel, 
    const int nrSteps, 
    __global float* trj, 
    __global float* sigarr, 
    const float sigma, 
    const float mass, 
    const float alpha, // alpha is resistance in reverse. 
    const float dt 
){ 
    int i,k,step; 
    float h, sigsum, hexp; 
    int pidx = get_global_id(0); // global ID used as particle index 
    int ofs = pidx * nrSteps * dim; 
    int accofs = ofs + (nrSteps-1) * dim; // use last trj point to tmp store acc vector 
    float v[dim]; 
    float sigma2 = sigma*sigma; 
    float m = mass/sigma2; 
    float dt_over_m = dt /m; 
    for(step=0; step<nrSteps; step++){ 
     for(k=0; k<dim; k++) 
     { 
      trj[accofs+k]=0; 
     } 
     for(i=0; i<N; i++) 
     { 

      h=0; // to store ||data[i]-x||**2 
      for(k=0; k<dim; k++) 
      { 
       v[k] = pos[pidx*dim+k] - data[i*dim + k]; 
       h += v[k]*v[k];  //h == force1p_sum 
      }; 
      hexp = exp(-h/sigma2)/sigma2; 

      for(k=0; k<dim; k++) 
      { 
       trj[accofs+k] += -(hexp) * v[k]; 
      };   
     }; 
     sigsum = 0; 
     for(k=0; k<dim; k++) 
     { 
      vel[pidx*dim+k]  = alpha * vel[pidx*dim+k] + dt_over_m * trj[accofs+k];  // vel = alpha*vel + acc*dt 
      pos[pidx*dim+k] += dt * vel[pidx*dim+k];      // pos = pos + vel*dt 
      sigsum    += vel[pidx*dim+k] * vel[pidx*dim+k]; // v^2 for kinetic energy 
      trj[ofs+step*dim+k] = pos[pidx*dim+k];    // write to result vector 

     }; 
     sigarr[pidx*nrSteps+step] = sigsum;     // sig = | vel | 
    } 
    for(step=0; step<nrSteps-2; step++) 
    { 
     sigarr[pidx*nrSteps+step] = sigarr[pidx*nrSteps+step+2] - sigarr[pidx*nrSteps+step+1]; 
    }; 
    sigarr[pidx*nrSteps+nrSteps-1] = sigarr[pidx*nrSteps+nrSteps-2] = 0; 

} 
""" 

感谢

嘉俊

+0

你能分享内核代码吗?它返回BUILD_PROGRAM_FAILURE,所以内核代码一定有问题。 –

+0

'clBuildProgram'也应该给你诊断输出并告诉你问题出在哪里。如果您无法理解这一点,请将其与源代码一起张贴@parallelhighway建议,我们可以尝试提供帮助。 – pmdj

+0

嗨,我添加了内核代码。谢谢 –

回答

1

你应该尝试查询生成的误差在这样的案例。在类似的内核代码错误中你可以做的另一件事是你可以使用脱机编译器。每个OpenCL实施者都有离线编译器。

你可以在这里找到英特尔的OpenCL编译器离线:https://software.intel.com/en-us/articles/programming-with-the-intel-sdk-for-opencl-applications-development-tools

AMD有一个叫做CodeXL工具,在其中你也可以做离线编辑,看看你的内核代码编译。

这里是ARM的OpenCL编译器离线:https://developer.arm.com/products/software-development-tools/graphics-development-tools/mali-offline-compiler/downloads

英特尔的支持是最多的OpenCL 2.1,而ARM直到1.1支持。所以,你可以选择其中的任何一个来编译你的内核代码,以便轻松找出错误或错误。

在你的核心的问题是以下行:

float v[dim]; 

的OpenCL C规范不允许变长数组和离线编译器提供了以下错误:

ERROR: <source>:22:12: error: variable length arrays are not supported in OpenCL 

您可以修复为了克服这个错误,从现在开始,你可以检查你的内核是否可以用离线编译器编译。

编辑:在说明书中,有一个脚注解释了变长数组不支持。你可以在这里看到它:

https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf#page=31

+0

嗨,你是对的。当我用固定长度替换它时,它可以工作。但我不太清楚的是,我之前一直在使用CPU和Nvidia 970 GPU的可变长度。所有这些工作,但不是英特尔Iris GPU。任何想法为什么会发生?暗淡是我的数据的维度,除非我每次都手动更改它,否则需要将其作为一个变量,是否有任何复飞?非常感谢 –

+0

您可以在CPU上创建v值并将其作为参数传递。在这种情况下,不允许在内核中定义可变长度。 –