2017-09-30 65 views
-1

我有两个问题想要呈现给您。CUDA和C++链接/编译,cudaMalloc上的程序崩溃

I)

我有一个.cpp文件,其中是main(),为了调用内核(在。cu文件),我用的是extern功能的.cu文件,launch(),它调用内核。这两个文件分别是.cu.cpp正在编译成功。从而为他们结合在一起的,因为我在CUDA初学者,我想两件事情:

1)nvcc -Wno-deprecated-gpu-targets -o final file1.cpp file2.cu,成功地使任何错误和编译最终方案和

2)

nvcc -Wno-deprecated-gpu-targets -c file2.cu 
    g++ -c file1.cpp 
    g++ -o program file1.o file2.o -lcudart -lcurand -lcutil -lcudpp -lcuda 

在第二种情况下,由于-l参数未被识别(只有-lcuda),我猜是因为我没有指定它们的路径,因为我不知道这些文件存储在哪里。如果我跳过这些-l参数,错误的是:

$ g++ -o final backpropalgorithm_CUDA_kernel_copy.o backpropalgorithm_CUDA_main_copy.o -lcuda 
backpropalgorithm_CUDA_kernel_copy.o: In function `launch': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x185): undefined reference to `cudaConfigureCall' 
backpropalgorithm_CUDA_kernel_copy.o: In function `__cudaUnregisterBinaryUtil()': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x259): undefined reference to `__cudaUnregisterFatBinary' 
backpropalgorithm_CUDA_kernel_copy.o: In function `__nv_init_managed_rt_with_module(void**)': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x274): undefined reference to `__cudaInitModule' 
backpropalgorithm_CUDA_kernel_copy.o: In function `__device_stub__Z21neural_network_kernelPfPiS0_PdS1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_(float*, int*, int*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*)': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x2ac): undefined reference to `cudaSetupArgument' 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x2cf): undefined reference to `cudaSetupArgument' 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x2f2): undefined reference to `cudaSetupArgument' 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x315): undefined reference to `cudaSetupArgument' 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x338): undefined reference to `cudaSetupArgument' 
backpropalgorithm_CUDA_kernel_copy.o:tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x35b): more undefined references to `cudaSetupArgument' follow 
backpropalgorithm_CUDA_kernel_copy.o: In function `__nv_cudaEntityRegisterCallback(void**)': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x663): undefined reference to `__cudaRegisterFunction' 
backpropalgorithm_CUDA_kernel_copy.o: In function `__sti____cudaRegisterAll_69_tmpxft_0000717b_00000000_7_backpropalgorithm_CUDA_kernel_copy_cpp1_ii_43082cd7()': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x67c): undefined reference to `__cudaRegisterFatBinary' 
backpropalgorithm_CUDA_kernel_copy.o: In function `cudaError cudaLaunch<char>(char*)': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x6c0): undefined reference to `cudaLaunch' 
backpropalgorithm_CUDA_main_copy.o: In function `main': 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x92): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0xf8): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x118): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x12c): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x14c): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x160): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x180): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x194): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1b4): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1c8): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1e8): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1ff): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x21f): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x236): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x256): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x26a): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x28a): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2a1): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2c1): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2d5): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2f5): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x309): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x329): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x33d): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x35d): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x371): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x391): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3a5): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3c5): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3dc): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3fc): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x413): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x433): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x44a): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x46a): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x481): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x4a1): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x5bf): undefined reference to `cudaDeviceSynchronize' 
collect2: error: ld returned 1 exit status 

的事情是,在与“成功”的编译和链接,第一种情况,当我运行程序它显示为输出只有闪烁的光标(在输入命令的下一行),没有别的,在控制台;通常它应该使用CUDA来计算并显示正在构建的神经网络的误差。

II) 我试图在.cu文件中登录printf(),但它没有显示任何内容。我搜索了一下,发现可能我应该使用cuPrintf()函数。我试过了,但是我遇到了头文件问题,包含它们没有定义的文件,尽管我手动包含了它们。我发现我应该包含一个cuPrintf.cu文件,我在网上找到了哪些源代码。

不幸的是,那么,当我单独编译他们,因为.cu文件中的错误是

ptxas fatal : Unresolved extern function '_Z8cuPrintfIjEiPKcT_' 

.cpp没有错误,但是。

为什么会出现所有这些错误?错误的部分在哪里?为什么程序运行不正常,printf()似乎没有在内核中工作?为什么程序只显示一个闪烁的光标,而没有其他的东西? 如果有人能够启发我这些问题,我将非常感激,非常感谢您提前!

我的两个文件的代码是:

file1.cpp

​​

file.cu

#define w(i,j) w[(i)*(InputN*hn) + (j)] 
#define v(i,j) v[(i)*(hn*OutN) + (j)] 
#define x_out(i,j) x_out[(i)*(InputN) + (j)] 
#define y(i,j) y[(i)*(OutN) + (j)] 
#define hn_out(i,j) hn_out[(i)*(hn) + (j)] 
#define y_out(i,j) y_out[(i)*(OutN) + (j)] 
#define y_delta(i,j) y_delta[(i)*(OutN) + (j)] 
#define hn_delta(i,j) hn_delta[(i)*(hn) + (j)] 
#define deltav(i,j) deltav[(i)*(hn*OutN) + (j)] 
#define deltaw(i,j) deltaw[(i)*(InputN*hn) + (j)] 

#define datanum 4  // number of training samples 
#define InputN 16  // number of neurons in the input layer 
#define hn 64   // number of neurons in the hidden layer 
#define OutN 1   // number of neurons in the output layer 
#define threads_per_block 256 
#define MAX_RAND 100 
#define MIN_RAND 10 

#include <stdio.h> 
#include <math.h> //for truncf() 


// sigmoid serves as avtivation function 
__device__ double sigmoid(double x){ 
    return(1.0/(1.0 + exp(-x))); 
} 


__device__ int rand_kernel(int index, float *randData){ 
    float myrandf = randData[index]; 
    myrandf *= (MAX_RAND - MIN_RAND + 0.999999); 
    myrandf += MIN_RAND; 
    int myrand = (int)truncf(myrandf); 
    return myrand; 
} 


__global__ void neural_network_kernel (float *randData, int *times, int *loop, double *error, double *max, double *min, double *x_out, double *hn_out, double *y_out, double *y, double *w, double *v, double *deltaw, double *deltav, double *hn_delta, double *y_delta, double *alpha, double *beta, double *sumtemp, double *errtemp) 
{ 
    //int i = blockIdx.x; 
    //int idx = threadIdx.x; 
    //int idy = threadIdx.y 

    int index = blockIdx.x * blockDim.x + threadIdx.x; 

    // training set 
    struct{ 
     double input_kernel[InputN]; 
     double teach_kernel[OutN]; 
    }data_kernel[threads_per_block + datanum]; 

    if (index==0) 
    { 
     for(int m=0; m<datanum; m++){ 
      for(int i=0; i<InputN; i++) 
       data_kernel[threads_per_block + m].input_kernel[i] = (double)rand_kernel(index, randData)/32767.0; 
      for(int i=0;i<OutN;i++) 
       data_kernel[threads_per_block + m].teach_kernel[i] = (double)rand_kernel(index, randData)/32767.0; 
     } 
    } 


    // Initialization 
    for(int i=0; i<InputN; i++){ 
     for(int j=0; j<hn; j++){ 
      w(i,j) = ((double)rand_kernel(index, randData)/32767.0)*2-1; 
      deltaw(i,j) = 0; 
     } 
    } 
    for(int i=0; i<hn; i++){ 
     for(int j=0; j<OutN; j++){ 
      v(i,j) = ((double)rand_kernel(index, randData)/32767.0)*2-1; 
      deltav(i,j) = 0; 
     } 
    } 


    while(loop[index] < *times){ 
     loop[index]++; 
     error[index] = 0.0; 

     for(int m=0; m<datanum ; m++){ 
      // Feedforward 
      max[index] = 0.0; 
      min[index] = 0.0; 
      for(int i=0; i<InputN; i++){ 
       x_out(index,i) = data_kernel[threads_per_block + m].input_kernel[i]; 
       if(max[index] < x_out(index,i)) 
        max[index] = x_out(index,i); 
       if(min[index] > x_out(index,i)) 
        min[index] = x_out(index,i); 
      } 
      for(int i=0; i<InputN; i++){ 
       x_out(index,i) = (x_out(index,i) - min[index])/(max[index] - min[index]); 
      } 

      for(int i=0; i<OutN ; i++){ 
       y(index,i) = data_kernel[threads_per_block + m].teach_kernel[i]; 
      } 

      for(int i=0; i<hn; i++){ 
       sumtemp[index] = 0.0; 
       for(int j=0; j<InputN; j++) 
        sumtemp[index] += w(j,i) * x_out(index,j); 
       hn_out(index,i) = sigmoid(sumtemp[index]);  // sigmoid serves as the activation function 
      } 

      for(int i=0; i<OutN; i++){ 
       sumtemp[index] = 0.0; 
       for(int j=0; j<hn; j++) 
        sumtemp[index] += v(j,i) * hn_out(index,j); 
       y_out(index,i) = sigmoid(sumtemp[index]); 
      } 

      // Backpropagation 
      for(int i=0; i<OutN; i++){ 
       errtemp[index] = y(index,i) - y_out(index,i); 
       y_delta(index,i) = -errtemp[index] * sigmoid(y_out(index,i)) * (1.0 - sigmoid(y_out(index,i))); 
       error[index] += errtemp[index] * errtemp[index]; 
      } 

      for(int i=0; i<hn; i++){ 
       errtemp[index] = 0.0; 
       for(int j=0; j<OutN; j++) 
        errtemp[index] += y_delta(index,j) * v(i,j); 
       hn_delta(index,i) = errtemp[index] * (1.0 + hn_out(index,i)) * (1.0 - hn_out(index,i)); 
      } 

      // Stochastic gradient descent 
      for(int i=0; i<OutN; i++){ 
       for(int j=0; j<hn; j++){ 
        deltav(j,i) = (*alpha) * deltav(j,i) + (*beta) * y_delta(index,i) * hn_out(index,j); 
        v(j,i) -= deltav(j,i); 
       } 
      } 

      for(int i=0; i<hn; i++){ 
       for(int j=0; j<InputN; j++){ 
        deltaw(j,i) = (*alpha) * deltaw(j,i) + (*beta) * hn_delta(index,i) * x_out(index,j); 
        w(j,i) -= deltaw(j,i); 
       } 
      } 
     } 

     // Global error 
     error[index] = error[index]/2; 
     /*if(loop%1000==0){ 
      result = "Global Error = "; 
      sprintf(buffer, "%f", error); 
      result += buffer; 
      result += "\r\n"; 
     } 
     if(error < errlimit) 
      break;*/ 

     printf("The %d th training, error: %0.100f\n", loop[index], error[index]); 
    } 
} 


extern "C" 
void launch(float *randData, int *times, int *loop, double *error, double *max, double *min, double *x_out, double *hn_out, double *y_out, double *y, double *w, double *v, double *deltaw, double *deltav, double *hn_delta, double *y_delta, double *alpha, double *beta, double *sumtemp, double *errtemp) 
{ 
    int blocks = *times/threads_per_block; 
    neural_network_kernel<<<blocks, threads_per_block>>>(randData, times, loop, error, max, min, x_out, hn_out, y_out, y, w, v, deltaw, deltav, hn_delta, y_delta, alpha, beta, sumtemp, errtemp); 
} 

UPDATE:

我发现关于内存分配的一些错误与指针。我更新了上面的代码...现在的主要问题是:

1)是链接/编译正确,这是我应该怎么办呢?我的意思是第一种方式。

2)我发现闪烁光标在第一cudaMalloc()期间立即显示。在那之前它运行正确。

但在第一cudaMalloc()它挂一辈子,为什么呢?

回答

1

之前寻求帮助在这里,它很好的做法,使用正确的CUDA错误检查与cuda-memcheck运行代码。如果你不这样做,你可能会忽略有用的错误信息,浪费你的时间以及其他人试图帮助你。

在第二种情况下,由于-l参数不被识别(仅-lcuda是),我猜是因为我没有指定他们的道路,因为我不知道这些文件的存储做。

你不想跳过这些。 nvcc会自动链接到这些库,并自动知道在哪里找到它们。当使用g ++时,你必须告诉它在哪里看和你需要的特定库。因为你们中的代码,你并不需要所有这些,如果你链接库,所以下面应该是足够了:对于一个标准的Linux安装CUDA的

g++ -o program file1.o file2.o -L/usr/local/cuda/lib64 -lcudart 

。如果你没有一个标准的安装,你可以做which nvcc,找出nvcc的位置,然后用它来寻找可能的地方,库位于(改bin的路径lib64

如果你确实需要一些与其他图书馆,像cutilcudpp将无法​​使用,除非你去特殊的步骤来安装它们,你需要确定在这种情况下的路径给他们。

关于cuPrintf,如果您正在编译并在cc2.0或更新的GPU(无论如何CUDA 8支持的最低计算能力)上运行,则不应该这样。普通printf应在设备代码工作,如果不是(因为你有一个设备代码错误 - 用正确的错误检查和cuda-memcheck)然后cuPrintf将不起作用任何好转。因此,而不是让该工作摔跤,只是恢复到使用printf代替代码(包括stdio.h)。

关于你的程序,为什么它不工作,我想你可能有一些错误。您可能想要了解如何使用调试器。蝙蝠权利,在主机代码中,您尝试从主机代码初始化randData是非法的。

现在我看到你已经多次改变了这个问题,现在把它变成一个移动的目标,我会停下来。

如果您需要帮助,请停下移动目标。

使用适当的cuda错误检查。