2017-10-20 96 views
1

我真的是OpenCL的新手。我已经从这个网站获取了示例代码:http://www.drdobbs.com/open-source/easy-opencl-with-python/240162614?pgno=2,我已经对它进行了一些定制。我的目标是向内核发送一个填充1个数字的4x4矩阵,并从内核恢复它。我知道这是一个微不足道的代码,但我需要这样做来了解OpenCL的工作原理。输入矩阵是这一个:PyOpenCL 2D阵列内核get_global_id(1)错误

[[ 1. 1. 1. 1.] 
[ 1. 1. 1. 1.] 
[ 1. 1. 1. 1.] 
[ 1. 1. 1. 1.]] 

但是,输出我从内核得到的是这一个,应该是一样的输入:

[[ 1. 1. 1. 1.] 
[ 0. 0. 0. 0.] 
[ 0. 0. 0. 0.] 
[ 0. 0. 0. 0.]] 

这是我的全码:

import pyopencl as cl 
from pyopencl import array 
import numpy as np 

## Step #1. Obtain an OpenCL platform. 
platform = cl.get_platforms()[0] 

## It would be necessary to add some code to check the check the support for 
## the necessary platform extensions with platform.extensions 

## Step #2. Obtain a device id for at least one device (accelerator). 
device = platform.get_devices()[1] 

## It would be necessary to add some code to check the check the support for 
## the necessary device extensions with device.extensions 

## Step #3. Create a context for the selected device. 
context = cl.Context([device]) 

## Step #4. Create the accelerator program from source code. 
## Step #5. Build the program. 
## Step #6. Create one or more kernels from the program functions. 
program = cl.Program(context, """ 
    __kernel void matrix_dot_vector(const unsigned int size, __global const float *matrix, __global float *result) 
    { 
     int x = get_global_id(0); 
     int y = get_global_id(1); 
     result[x + size * y] = matrix[x + size * y]; 
    } 
    """).build() 

matrix = np.ones((4,4), np.float32) 

## Step #7. Create a command queue for the target device. 
queue = cl.CommandQueue(context) 

## Step #8. Allocate device memory and move input data from the host to the device memory. 
mem_flags = cl.mem_flags 
#matrix_buf = cl.Buffer(context, mem_flags.READ_ONLY | mem_flags.COPY_HOST_PTR, hostbuf=matrix) 
matrix_buf = cl.Buffer(context, mem_flags.READ_ONLY | mem_flags.COPY_HOST_PTR, hostbuf=matrix) 
destination_buf = cl.Buffer(context, mem_flags.WRITE_ONLY, matrix.nbytes) 

## Step #9. Associate the arguments to the kernel with kernel object. 
## Step #10. Deploy the kernel for device execution. 
program.matrix_dot_vector(queue, matrix.shape, None, np.int32(matrix.size), matrix_buf, destination_buf) 

## Step #11. Move the kernels output data to host memory. 
matrix_dot_vector = np.ones((4,4), np.float32) 
cl.enqueue_copy(queue, matrix_dot_vector, destination_buf) 

## Step #12. Release context, program, kernels and memory. 
## PyOpenCL performs this step for you, and therefore, 
## you don't need to worry about cleanup code 

print(matrix_dot_vector) 

据我所见,int y = get_global_id(1);的值始终是0.这就是导致错误的原因,我不明白为什么它总是0,因为我将正确的形状传递给内核program.matrix_dot_vector(queue, matrix.shape, None, np.int32(matrix.size), matrix_buf, destination_buf)这是第二个参数matrix.shape并等于(4,4)。

有没有人猜测出了什么问题?

谢谢!

回答

2

第一个内核参数传递的值不正确 - 大小不应该是总矩阵大小。将np.int32(matrix.size)更改为np.int32(matrix.shape[0])

+0

完美!我知道它必须是这样的,但现在,你能解释我在做什么和你告诉我做什么之间的区别吗? – kelirkenan

+0

在内核中,您正在计算flatten数组中元素的位置,所以对于'x + size * y',size不能是size = matrix.shape [0] * matrix.shape [1]'但是size = matrix.shape [0]',其中matrix.shape [0]是第一维的矩阵大小。 – doqtor