将数据从CPU传递到GPU，而不将其明确作为参数传递

是否可以将数据从CPU传递到GPU，而无需将其作为参数进行显式传递？我不想将它作为参数传递，主要是为了语法糖的原因 - 我有大约20个常量参数需要传递，还因为我连续调用两个具有（几乎）相同参数的内核。将数据从CPU传递到GPU，而不将其明确作为参数传递

我想沿着

__constant__ int* blah; 

__global__ myKernel(...){ 
    ... i want to use blah inside ... 
} 

int main(){ 
    ... 
    cudaMalloc(...allocate blah...) 
    cudaMemcpy(copy my array from CPU to blah) 

}

来源

2011-10-13 George Karpenkov

为什么不把你的参数包装到结构中？应避免通过全局变量调用参数。 –

cudaMemcpyToSymbol线的东西似乎是你正在寻找的功能。它的工作方式与cudaMemcpy类似，但是有一个额外的“偏移量”参数，看起来可以更容易地在2D数组中进行复制。

（我不愿意提供的代码，因为我无法测试它 - 但看到this thread和this post仅供参考）

来源

2011-10-13 02:46:52

是的，就是这样，非常感谢。 –

使用__device__应用全局变量。这与使用方法相似__constant__

来源

2011-10-13 03:12:05 Yik

您可以采取一些方法。这取决于你将如何使用这些数据。

如果你的模式访问不断和块内的线程读取相同的位置，使用__constant__内存广播读取请求。
如果你的模式访问与给定位置的邻居，或者随机存取（未合并），那么我会如果你需要读推荐使用纹理内存

/写数据，知道你的数组的大小在内核中定义为__device__ blah [size]。

在例如：

__constant__ int c_blah[65536]; // constant memory 
__device__ int g_blah[1048576]; // global memory 

__global__ myKernel() { 
    // ... i want to use blah inside ... 
    int idx = threadIdx.x + blockIdx.x * blockDim.x; 
    // get data from constant memory 
    int c = c_blah[idx]; 
    // get data from global memory 
    int g = g_blah[idx]; 
    // get data from texture memory 
    int t = tex1Dfetch(ref, idx); 
    // operate 
    g_blah[idx] = c + g + t; 
} 


int main() { 
    // declare array in host 
    int c_h_blah[65536]; // and initialize it as you want 
    // copy from host to constant memory 
    cudaMemcpyToSymbol(c_blah, c_h_blah, 65536*sizeof(int), 0, cudaMemcpyHostToDevice); 
    // declare other array in host 
    int g_h_blah[1048576]; // and initialize it as you want 
    // declare one more array in host 
    int t_h_blah[1048576]; // and initialize it as you want 
    // declare a texture reference 
    texture<int, 1, cudaReadModeElementType> tref; 
    // bind the texture to the array 
    cudaBindTexture(0,tref,t_h_blah, 1048576*sizeof(int)); 
    // call your kernel 
    mykernel<<<dimGrid, dimBlock>>>(); 
    // copy result from GPU to CPU memory 
    cudaMemcpy(g_h_blah, g_blah, 1048576*sizeof(int), cudaMemcpyDeviceToHost); 
}

您可以使用三种阵列内核不带任何参数传递给内核。请注意，这只是一个使用示例，并不是对存储器层次结构的优化使用，即：不建议以这种方式使用常量内存。

希望得到这个帮助。

来源

2011-10-13 07:35:24 pQB

方法3不起作用。 CUDA中的函数体内的__device__'声明是非法的。 – talonmies

@talonmies但是我们可以在文件的全局范围声明\ __ device__，对吗？ – pQB

是的，这将工作，但它会在计算1.x目标上产生编译器警告。 – talonmies

将数据从CPU传递到GPU，而不将其明确作为参数传递

回答

相关问题