CUDA内核不能改变输入数组

我的CUDA内核似乎并没有被改变我通过在阵列中的值，这里的相关主机代码：CUDA内核不能改变输入数组

dim3 grid(numNets, N); 
dim3 threads(1, 1, 1); 

// allocate the arrays and jagged arrays on the device 
alloc_dev_memory(state0, state1, d_state0, d_state1, 
        adjlist, d_adjlist, transfer, d_transfer, 
        indeg, d_indeg, d_N,  d_K,  d_S,   
        d_Spow, d_numNets); 

// operate on the device memory 
kernel<<< grid, threads >>>(d_state0, d_state1, d_adjlist, d_transfer, d_indeg, 
          d_N,  d_K,  d_S,  d_Spow,  d_numNets); 

// copy the new states from the device to the host 
cutilSafeCall(cudaMemcpy(state0, d_state0, ens_size*sizeof(int), 
          cudaMemcpyDeviceToHost)); 

// copy the new states from the array to the ensemble 
for(int i=0; i < numNets; ++i) 
    nets[i]->set_state(state0 + N*i);

这里是被称为内核代码：

// this dummy kernel just sets all the values to 0 for checking later. 
__global__ void kernel(int * state0,  
         int * state1, 
        int ** adjlist, 
        luint ** transfer, 
         int * indeg, 
         int * d_N, 
        float * d_K, 
         int * d_S, 
        luint * d_Spow, 
         int * d_numNets) 
{ 
    int  N = *d_N; 
    luint * Spow = d_Spow; 
    int tid = blockIdx.x*N + blockIdx.y; 

    state0[tid] = 0; 
    state1[tid] = 0; 

    for(int k=0; k < indeg[tid]; ++k) { 
     adjlist[tid][k] = 0; 
    } 
    for(int k=0; k < Spow[indeg[tid]]; ++k) { 
     transfer[tid][k] = 0; 
    } 
}

然后，使用cudaMemcpy得到state0阵列后面的主机上，如果我遍历state0和发送所有的值到标准输出后，他们是一样的初始值，即使我的内核被写入将所有值设置为零。

预期输出应该state0的初始值：101111101011，随后state0的最终值：（全0）

此代码输出的样品运行：

101111101011 
101111101011 

Press ENTER to exit...

第二行应全部为零。为什么CUDA内核不影响state0阵列？

来源

2011-09-05 tlehman

试着让你的代码变得更小，专注于你正在检查的一件事。只有在工作后才加回其他位。 –

明显的原因是内核永远不会运行，但根据您发布的内容绝对不可能说出，因为大多数可能的故障点未显示，代码中的所有常量都具有未知值，并且你有不完整的错误检查。 – talonmies

为什么内核不能运行？我在使用'cudaGetLastError（）'的内核调用之后添加了一行，但返回值是'cudaSuccess'。 – tlehman

我发现N和numNets的值是垃圾值。由N的偏移是错误的，所以这些值被设置在数组之外。 @pQB，你的建议正是我所需要的。

来源

2011-09-21 17:51:33 tlehman

CUDA内核不能改变输入数组

回答

相关问题