2010-03-19 22 views
3

我刚开始学习如何使用CUDA。我想运行一些简单的示例代码:当我在仿真模式下运行CUDA:cudaMemcpy只能在仿真模式下工作


float *ah, *bh, *ad, *bd; 
ah = (float *)malloc(sizeof(float)*4); 
bh = (float *)malloc(sizeof(float)*4); 
cudaMalloc((void **) &ad, sizeof(float)*4); 
cudaMalloc((void **) &bd, sizeof(float)*4); 
... initialize ah ... 

/* copy array on device */ 
cudaMemcpy(ad,ah,sizeof(float)*N,cudaMemcpyHostToDevice); 
cudaMemcpy(bd,ad,sizeof(float)*N,cudaMemcpyDeviceToDevice); 
cudaMemcpy(bh,bd,sizeof(float)*N,cudaMemcpyDeviceToHost); 

(NVCC -deviceemu)运行良好(和实际拷贝阵列)。 但是,当我在常规模式下运行它时,它运行没有错误,但从不复制数据。就好像cudaMemcpy行被忽略。

我在做什么错?

非常感谢你, 杰森

+0

糟糕。这似乎是与cudaMalloc()问题。它没有在设备上分配内存。这是为什么? – Jason 2010-03-19 19:01:17

+0

你初始化了设备吗? 使用cuda获取上次错误以打印状态 – Anycorn 2010-03-19 20:34:04

+1

@aaa:使用运行时API(以cuda而不是cu为前缀的函数)意味着您不需要明确初始化设备,它将在第一次cuda调用时附加到第一个兼容设备。 – Tom 2010-03-20 17:24:05

回答

3

你应该检查错误,最好每个malloc和memcpy的,但只是做一次,在年底就足够了(cudaGetErrorString(cudaGetLastError())

只是为了检查明显:

  • 你有一个CUDA的GPU,右运行deviceQuery SDK样本,以检查设备是否正常工作,并安装所有的驱动程序和工作
  • N(在memcpy中)等于4(在malloc中),对不对?
1

查看您是否拥有支持CUDA的设备。可能您可以尝试运行下面的代码并查看您获得的信息:

#include <cstdio> 

int main(void) { 
    cudaDeviceProp prop; 

    int count; 
    cudaGetDeviceCount(&count); 
    for (int i=0; i< count; i++) { 
     cudaGetDeviceProperties(&prop, i); 
     printf(" --- General Information for device %d ---\n", i); 
     printf("Name: %s\n", prop.name); 
     printf("Compute capability: %d.%d\n", prop.major, prop.minor); 
     printf("Clock rate: %d\n", prop.clockRate); 
     printf("Device copy overlap: "); 
     if (prop.deviceOverlap) 
      printf("Enabled\n"); 
     else 
      printf("Disabled\n"); 
     printf("Kernel execution timeout : "); 
     if (prop.kernelExecTimeoutEnabled) 
      printf("Enabled\n"); 
     else 
      printf("Disabled\n"); 

     printf(" --- Memory Information for device %d ---\n", i); 
     printf("Total global mem: %ld\n", prop.totalGlobalMem); 
     printf("Total constant Mem: %ld\n", prop.totalConstMem); 
     printf("Max mem pitch: %ld\n", prop.memPitch); 
     printf("Texture Alignment: %ld\n", prop.textureAlignment); 

     printf(" --- MP Information for device %d ---\n", i); 
     printf("Multiprocessor count: %d\n", 
        prop.multiProcessorCount); 
     printf("Shared mem per mp: %ld\n", prop.sharedMemPerBlock); 
     printf("Registers per mp: %d\n", prop.regsPerBlock); 
     printf("Threads in warp: %d\n", prop.warpSize); 
     printf("Max threads per block: %d\n", 
        prop.maxThreadsPerBlock); 
     printf("Max thread dimensions: (%d, %d, %d)\n", 
        prop.maxThreadsDim[0], prop.maxThreadsDim[1], 
        prop.maxThreadsDim[2]); 
     printf("Max grid dimensions: (%d, %d, %d)\n", 
        prop.maxGridSize[0], prop.maxGridSize[1], 
        prop.maxGridSize[2]); 
     printf("\n"); 
    } 
} 
相关问题