的printf与-arch = sm_20没有在内核文件显示anaything

我在CUDA程序的printf与-arch = sm_20没有在内核文件显示anaything

__device__ __global__ void Kernel(float *, float * ,int); 
void DeviceFunc(float *temp_h , int numvar , float *temp1_h) 
{ ..... 
    //Kernel call 
    printf("calling kernel\n"); 
    Kernel<<<dimGrid , dimBlock>>>(a_d , b_d , numvar); 
    printf("kernel called\n"); 
    .... 
} 

int main(int argc , char **argv) 
{ .... 
    printf("beforeDeviceFunc\n\n"); 
    DeviceFunc(a_h , numvar , b_h); //Showing the data 
    printf("after DeviceFunc\n\n"); 
    .... 
}

另外在Kernel.cu增加了一些printf语句，我写道：

#include<cuda.h> 
#include <stdio.h> 
__device__ __global__ void Kernel(float *a_d , float *b_d ,int size) 
{ 
    int idx = threadIdx.x ; 
    int idy = threadIdx.y ; 
    //Allocating memory in the share memory of the device 
    __shared__ float temp[16][16]; 

    //Copying the data to the shared memory 
    temp[idy][idx] = a_d[(idy * (size+1)) + idx] ; 
    printf("idx=%d, idy=%d, size=%d", idx, idy, size); 
    .... 
}

然后我编译使用-arch=sm_20这样的：

nvcc -c -arch sm_20 main.cu 
nvcc -c -arch sm_20 Kernel.cu 
nvcc -arch sm_20 main.o Kernel.o -o main

现在，当我运行程序时，我看到：

beforeDeviceFunc 

calling kernel 
kernel called 
after DeviceFunc

因此，内核中的printf不会被打印。我该如何解决这个问题？

来源

2012-11-10 mahmood

在我的情况'cudaDeviceSynchronize（）'没有返回任何错误。但我注意到块尺寸太大（32 x 32），创建更小的线程块解决了问题。 – atoMerz

printf()仅当内核成功完成时才会显示输出，因此请检查所有CUDA函数调用的返回代码，并确保没有错误报告。

此外printf()输出只显示在程序的某些点上。 Appendix B.17.2 of the Programming Guide列出了这些作为

<<<>>>

cuLaunchKernel()

内核启动（在推出的开始，并且如果CUDA_LAUNCH_BLOCKING环境变量被设置为1，在推出的末端也一样），经由
同步cudaDeviceSynchronize()，cuCtxSynchronize()，cudaStreamSynchronize()，cuStreamSynchronize()，cudaEventSynchronize()，或cuEventSynchronize()，经由cudaMemcpy*()或cuMemcpy*()任何阻塞版本
内存拷贝，
国防部ULE装载/经由cuModuleLoad()或cuModuleUnload()卸载，经由cudaDeviceReset()或cuCtxDestroy()
语境破坏。
在执行由cudaStreamAddCallback()或cuStreamAddCallback()添加的流回调之前。

要轻松查看，这是你的问题，把下面的代码的内核调用后：

{ 
    cudaError_t cudaerr = cudaDeviceSynchronize(); 
    if (cudaerr != cudaSuccess) 
     printf("kernel launch failed with error \"%s\".\n", 
       cudaGetErrorString(cudaerr)); 
}

你应该再看看你的内核的输出或错误消息。

来源

2012-11-10 10:33:12 tera

我得到'内核启动失败，出现错误“CUDA驱动程序版本不足以支持CUDA运行时版本”。 – mahmood

然后检查您的安装并确保安装了最新（或足够新的）驱动程序。 – tera

我没有用于更新的root访问权限。你知道我该如何检查cuda 4中的错误？ – mahmood

的printf与-arch = sm_20没有在内核文件显示anaything

回答

相关问题