CUDA：一个struct

内部结构的数组的分配我已经这些结构：CUDA：一个struct

typedef struct neuron 
{ 
float* weights; 
int n_weights; 
}Neuron; 


typedef struct neurallayer 
{ 
Neuron *neurons; 
int n_neurons; 
int act_function; 
}NLayer;

“n图层”结构可以包含“神经元”的任意数量的

我试图分配5“神经元”一“n图层”结构从以这种方式主机：

NLayer* nL; 
int i; 
int tmp=9; 
cudaMalloc((void**)&nL,sizeof(NLayer)); 
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron)); 
for(i=0;i<5;i++) 
    cudaMemcpy(&nL->neurons[i].n_weights,&tmp,sizeof(int),cudaMemcpyHostToDevice);

...然后我试图修改“NL->神经元[0] .n_weights”变量与内核：

__global__ void test(NLayer* n) 
      { 
       n->neurons[0].n_weights=121; 
      }

，但在编译时NVCC返回该“警告”与内核无关的唯一行：

Warning: Cannot tell what pointer points to, assuming global memory space

当内核完成其工作的结构开始无法访问。

这很可能是我在分配过程中做错了什么事......有人可以帮助我吗？非常感谢，对不起我的英语！ :)

UPDATE：

感谢奥兰我修改我的代码创建这个函数应该分配结构“n图层”的一个实例：

NLayer* setNLayer(int numNeurons,int weightsPerNeuron,int act_fun) 
{ 
    int i; 
    NLayer h_layer; 
    NLayer* d_layer; 
    float* d_weights; 

    //SET THE LAYER VARIABLE OF THE HOST NLAYER 
    h_layer.act_function=act_fun; 
    h_layer.n_neurons=numNeurons; 
    //ALLOCATING THE DEVICE NLAYER 
    if(cudaMalloc((void**)&d_layer,sizeof(NLayer))!=cudaSuccess) 
     puts("ERROR: Unable to allocate the Layer"); 
    //ALLOCATING THE NEURONS ON THE DEVICE 
    if(cudaMalloc((void**)&h_layer.neurons,numNeurons*sizeof(Neuron))!=cudaSuccess) 
     puts("ERROR: Unable to allocate the Neurons of the Layer"); 
    //COPING THE HOST NLAYER ON THE DEVICE 
    if(cudaMemcpy(d_layer,&h_layer,sizeof(NLayer),cudaMemcpyHostToDevice)!=cudaSuccess) 
       puts("ERROR: Unable to copy the data layer onto the device"); 

    for(i=0;i<numNeurons;i++) 
    { 
     //ALLOCATING THE WEIGHTS' ARRAY ON THE DEVICE 
     cudaMalloc((void**)&d_weights,weightsPerNeuron*sizeof(float)); 
     //COPING ITS POINTER AS PART OF THE i-TH NEURONS STRUCT 
     if(cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice)!=cudaSuccess) 
       puts("Error: unable to copy weights' pointer to the device"); 
    } 


    //RETURN THE DEVICE POINTER 
    return d_layer; 
}

，我调用该函数（内核“测试”是以前声明的）：

int main() 
{ 
    NLayer* nL; 
    int h_tmp1; 
    float h_tmp2; 

    nL=setNLayer(10,12,13); 
    test<<<1,1>>>(nL); 
    if(cudaMemcpy(&h_tmp1,&nL->neurons[0].n_weights,sizeof(float),cudaMemcpyDeviceToHost)!=cudaSuccess); 
     puts("ERROR!!"); 
    printf("RESULT:%d",h_tmp1); 

}

当我编译该代码编译器显示我的警告，当我执行该程序时，它在屏幕上打印：

Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
ERROR!! 
RESULT:1

如果我评论内核调用，最后一个错误不会比较。

我在哪里错了？我不知道该怎么办感谢您的帮助！

来源

2012-08-08 Andrea Sylar Solla

这一切都取决于您使用的GPU卡。费米卡使用共享和全局存储空间的统一寻址，而费米卡之前没有。

对于费米前的情况，你不知道地址是共享的还是全局的。编译器通常可以解决这个问题，但有些情况下它不能。当需要指向共享内存的指针时，通常需要一个共享变量的地址，编译器可以识别这个地址。如果没有明确定义，则会显示“假定全局”消息。

如果您使用的是具有2.x或更高的计算capabiilty一个GPU，它应与-arch = sm_20编译器标志

来源

2012-08-09 01:13:28

虽然你对这个警告是正确的，但我怀疑是什么导致了程序的异常行为。毕竟，编译器关于位于全局Emory空间中的结构的假设是正确的... – aland 2012-08-09 01:52:52

我使用的是具有1.2功能的NVIDIA GeForce 320M 256 MB，所以我不认为它是“费米”卡 – 2012-08-09 09:29:32

问题的工作是在这里：

cudaMalloc((void**)&nL,sizeof(NLayer)); 
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron));

在第一行，nL指向设备上全局内存中的结构。因此，在第二行中，cudaMalloc的第一个参数是驻留在GPU上的地址，这是未定义的行为（在我的测试系统中，它会导致段错误;但在您的情况下，有更细微的变化）。

做你想做什么正确的方法是先在主内存中创建结构，用数据填充它，然后将其复制到设备，如：

NLayer* nL; 
NLayer h_nL; 
int i; 
int tmp=9; 
// Allocate data on device 
cudaMalloc((void**)&nL, sizeof(NLayer)); 
cudaMalloc((void**)&h_nL.neurons, 6*sizeof(Neuron)); 
// Copy nlayer with pointers to device 
cudaMemcpy(nL, &h_nL, sizeof(NLayer), cudaMemcpyHostToDevice);

另外，不要忘了始终检查CUDA例程中的任何错误。

UPDATE

在第二个版本的代码：

cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,...) ---再次，你在主机设备解引用指针（d_layer）。相反，你应该使用

cudaMemcpy(&h_layer.neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice

在这里，你拿h_layer（主机结构），读取其元素（h_layer.neurons），这是指向设备内存。然后你做一些指针算法（&h_layer.neurons[i].weights）。不需要访问设备内存来计算该地址。

来源

2012-08-09 01:23:59 aland

我已经修改了我的代码，但它不起作用，你可以看看吗？新的代码在我的第一篇文章...谢谢！ – 2012-08-09 12:08:03

哦！谢谢你的作品！我只有一个问题：如果我想从主机访问到数据竞争到整数变量** d_layer->神经元[0] .n_weights **我必须先在主机上复制** d_layer **，那么我必须在主机上复制** d_layer-> neurons [0] **，最后，我可以使用“d_layer-> neurons [0] .n_weights变量??我只是因为它的问题我试图用cudaMemcpy（...）直接复制“d_layer-> neurons [0] .n_weights”，但它总是返回“无效参数”错误。 – 2012-08-09 15:08:52

@AndreaSylarSolla你可以简单地使用'int t; cudaMemcpy（＆t，＆h_layer.neurons [0] .n_weights，....）'或'Neuron t; cudaMemcpy（＆t，＆h_layer.neurons [0]，....）'。没有必要复制'd_layer'，因为你只需要'神经元'指针的值，但是'h_layer'中的值相同。 – aland 2012-08-09 15:24:50

CUDA：一个struct

回答

相关问题