2012-08-09 73 views
0

我在尝试弄清楚如何从GPU检索3D阵列时遇到问题。 我想为主机代码中的3d数组分配内存,调用内核,数组将被填充,然后将主机代码中的3D数组检索到mexFunction(主代码)中的返回变量。CUDA检索3D阵列

我已经做了几次尝试,这是我的最新代码。结果都是'0',他们应该是'7'。谁能告诉我我哪里出错了?这可能与3D参数有关,我不认为我完全理解这一部分。

simulate3DArrays.cpp

/* Device code */ 
__global__ void simulate3DArrays(cudaPitchedPtr devPitchedPtr, 
          int width, 
          int height, 
          int depth) 
{ 
int threadId; 
threadId = (blockIdx.x * blockDim.x) + threadIdx.x; 

size_t pitch = devPitchedPtr.pitch; 

for (int widthIndex = 0; widthIndex < width; widthIndex++) { 
    for (int heightIndex = 0; heightIndex < height; heightIndex++) { 

     *((double*)(((char*)devPitchedPtr.ptr + threadId * pitch * height) + heightIndex * pitch) + widthIndex) = 7.0; 

    } 
}  
} 

mexFunction.cu

/* Host code */ 
#include <stdio.h> 
#include "mex.h" 

/* Kernel function */ 
#include "simulate3DArrays.cpp" 

/* Define some constants. */ 
#define width 5 
#define height 9 
#define depth 6 

void displayMemoryAvailability(mxArray **MatlabMemory); 

void mexFunction(int  nlhs, 
      mxArray *plhs[], 
      int  nrhs, 
      mxArray *prhs[]) 
{ 

double *output; 
mwSize ndim3 = 3; 
mwSize dims3[] = {height, width, depth}; 

plhs[0] = mxCreateNumericArray(ndim3, dims3, mxDOUBLE_CLASS, mxREAL); 
output = mxGetPr(plhs[0]); 

cudaExtent extent = make_cudaExtent(width * sizeof(double), height, depth); 
cudaPitchedPtr devicePointer; 
cudaMalloc3D(&devicePointer, extent); 


simulate3DArrays<<<1,depth>>>(devicePointer, width, height, depth); 

cudaMemcpy3DParms deviceOuput = { 0 }; 
deviceOuput.srcPtr.ptr = devicePointer.ptr; 
deviceOuput.srcPtr.pitch = devicePointer.pitch; 
deviceOuput.srcPtr.xsize = width; 
deviceOuput.srcPtr.ysize = height; 

deviceOuput.dstPtr.ptr = output; 
deviceOuput.dstPtr.pitch = devicePointer.pitch; 
deviceOuput.dstPtr.xsize = width; 
deviceOuput.dstPtr.ysize = height; 

deviceOuput.kind = cudaMemcpyDeviceToHost; 
/* copy 3d array back to 'ouput' */ 
cudaMemcpy3D(&deviceOuput); 


return; 
} /* End Mexfunction */ 
+0

您使用的每个API调用都返回一个错误代码。您应该检查所有这些以查看是否发生错误。它将帮助您以更高的精度确定确切的问题。 – talonmies 2012-08-09 05:38:05

回答

1

的基本问题似乎是,你是指示cudaMemcpy3D复制零个字节,因为你还没有包括非零定义传输到API的大小的范围。

您的转移或许可以这么简单:

cudaMemcpy3DParms deviceOuput = { 0 }; 
deviceOuput.srcPtr = devicePointer; 
deviceOuput.dstPtr.ptr = output; 
deviceOuput.extent = extent; 

cudaMemcpy3D(&deviceOuput); 

我不能对你正在使用的MEX接口是否是正确的评论,但内核表面上看起来正确的,我没有看到任何东西显然是错误的,没有去编译器,并试图用Matlab运行你的代码,我不能。