并行CUDA编程

-1

因此，我已经多次浏览过这些内容，但似乎无法解决这个问题。发生了什么变化是我试图从GPU内存复制到CPU内存的变量总是显示为空白。并行CUDA编程

从我的理解，我应该有一个变量或多个变量，并创建这些的副本，我将发送到GPU以及一些数据进行计算，一旦完成计算，返回并插入变量的内容从GPU转换成CPU。

但每次我这样做，我的变量'd_result'总是空的。如果任何人有关于如何解决这个问题的想法，将非常感激。

我的CUDA功能：

__global__ void gpu_histogram_equalization(unsigned char * img_out, unsigned char * img_in, 
          int * hist_in, int img_size, int nbr_bin){ 

    int *lut = (int *)malloc(sizeof(int)*nbr_bin); 
    int i, cdf, min, d; 
    /* Construct the LUT by calculating the CDF */ 
    cdf = 0; 
    min = 0; 
    i = threadIdx.x; 
    while(min == 0){ 
     min = hist_in[i++]; 
    } 
    d = img_size - min; 
    if(i < nbr_bin){ 
     cdf += hist_in[i]; 
     //lut[i] = (cdf - min)*(nbr_bin - 1)/d; 
     lut[i] = (int)(((float)cdf - min)*255/d + 0.5); 
     if(lut[i] < 0){ 
      lut[i] = 0; 
     } 
    } 

    /* Get the result image */ 
    if(i < img_size){ 
     if(lut[img_in[i]] > 255){ 
      img_out[i] = 255; 
     } 
     else{ 
      img_out[i] = (unsigned char)lut[img_in[i]]; 
     } 

    } 
}

然后我的函数调用它：

PGM_IMG gpu_contrast_enhancement_g(PGM_IMG img_in) 
{ 
    PGM_IMG result; 
    int hist[256]; 
    unsigned char * d_result; 

    result.w = img_in.w; 
    result.h = img_in.h; 
    result.img = (unsigned char *)malloc(result.w * result.h * sizeof(unsigned char)); 

    cudaMalloc(&d_result, result.w * result.h * sizeof(unsigned char)); 

    cudaMemcpy(d_result, result.img, result.w * result.h * sizeof(unsigned char), cudaMemcpyHostToDevice); 
    histogram(hist, img_in.img, img_in.h * img_in.w, 256); 
    gpu_histogram_equalization<<<1,result.w * result.h * sizeof(unsigned char)>>>(d_result,img_in.img,hist,result.w*result.h, 256); 

    cudaMemcpy(result.img, d_result, result.w * result.h * sizeof(unsigned char), cudaMemcpyDeviceToHost); 
    cudaFree(d_result); 

    return result; 
}

来源

2015-11-27 QQCompi

如果您需要调试帮助，您将不得不提供其他人可以编译和运行的最短，完整的示例，因为您提供的代码是不够的。另外，每个CUDA API调用都会返回一个状态，您应该检查它们是否存在运行时错误。如果使用cuda-memcheck运行代码会发生什么情况？它报告任何问题吗？ – talonmies

另外，你能否编辑你的问题标题来描述你的问题。标题对于搜索非常重要，并且标题中绝对说明没有任何关于你的问题实际上是什么 – talonmies

让我们来看看这条线：

gpu_histogram_equalization<<<1,result.w*result.h*sizeof(unsigned char)>>> 
     (d_result,img_in.img,hist,result.w*result.h, 256);

这里有一些问题，你必须：

img_in.img - 这是主机内存
HIST - 这是主机内存

正在发生的事情是，你的内核崩溃由于无效的内存访问。

请阅读here关于错误检查。

来源

2015-11-28 03:14:18 deathly809

事实上，如果您查看引用的代码中的内核参数，可以看到，对于任何非平凡的图像大小，块尺寸将会非常大以至于内核甚至不会启动。 – talonmies

回答

相关问题