我正在尝试将2d数组传递给内核,以便每个线程都可以访问index = threadIdx.x +(blockIdx.x * blockDim.x),但我无法计算出如何做到这一点以及如何将数据复制回来。管理2D CUDA阵列
size_t pitch;
cudaMallocPitch(&d_array, &pitch, block_size * sizeof(int), num_blocks);
cudaMemset2D(d_array, pitch, 0, block_size * sizeof(int), num_blocks * sizeof(int));
kernel<<<grid_size, block_size>>>(d_array, pitch);
cudaMemcpy2D(h_array, pitch, d_array, pitch, block_size, num_blocks, cudaMemcpyDeviceToHost);
for (num_blocks)
for(block_size)
h_array[block][thread] should be 1
__global__ void kernel(int *array, int pitch) {
int *row = (int*)((char*)array + blockIdx.x * pitch);
row[threadIdx.x] = 1;
return;
}
我在做什么错,在这里?
为什么要将数组转换为(char *)?这将导致一个错误的指针算术 – LarryPel
这就是它在这两个问题中描述的: http://stackoverflow.com/questions/1047369/allocate-2d-array-on-device-memory-in-cuda http: //stackoverflow.com/questions/5029920/how-to-use-2d-arrays-in-cuda – user1743798
@LarryPel:不,它不会。间距以字节为单位,并且需要指向字节大小的类型的指针才能正确执行指针计算。 – talonmies