pycuda似乎不确定性

我有一个奇怪的问题，CUDA，pycuda似乎不确定性

在下面的代码片段，

#include <stdio.h> 

#define OUTPUT_SIZE   26 

typedef $PRECISION REAL; 

extern "C"  
{ 
    __global__ void test_coeff (REAL* results) 
    { 
     int id  = blockDim.x * blockIdx.x + threadIdx.x; 

     int out_index = OUTPUT_SIZE * id; 
     for (int i=0; i<OUTPUT_SIZE; i++) 
     {    
      results[out_index+i]=id; 
      printf("q"); 
     } 
    } 
}

当我编译并运行代码（通过pycuda），它按预期工作。当我删除printf时，结果很奇怪 - 大部分数组都是正确填充的，但其中一些看起来完全是随机的。

这里是完整的Python代码：

import numpy as np 
import string 

#pycuda stuff 
import pycuda.driver as drv 
import pycuda.autoinit 

from pycuda.compiler import SourceModule 

class MC: 

    cudacodetemplate = """ 
    #include <stdio.h> 

    #define OUTPUT_SIZE   26 

    typedef $PRECISION REAL; 

    extern "C"  
    { 
     __global__ void test_coeff (REAL* results) 
     { 
      int id  = blockDim.x * blockIdx.x + threadIdx.x; 

      int out_index = OUTPUT_SIZE * id; 
      for (int i=0; i<OUTPUT_SIZE; i++) 
      {    
       results[out_index+i]=id; 
       //printf("q"); 
      } 
     } 
    } 
    """ 

    def __init__(self, size, prec = np.float32): 
     #800 meg should be enough . . . 
     drv.limit.MALLOC_HEAP_SIZE = 1024*1024*800 

     self.size  = size 
     self.prec  = prec 
     template  = string.Template(MC.cudacodetemplate) 
     self.cudacode = template.substitute(PRECISION = 'float' if prec==np.float32 else 'double') 

     #self.module  = pycuda.compiler.SourceModule(self.cudacode, no_extern_c=True, options=['--ptxas-options=-v']) 
     self.module  = SourceModule(self.cudacode, no_extern_c=True) 

    def test(self, out_size): 
     #try to precalc the co-efficients for just the elements of the vector that changes 
     test = np.zeros((128, out_size*(2**self.size)), dtype=self.prec) 
     test2 = np.zeros((128, out_size*(2**self.size)), dtype=self.prec) 

     test_coeff = self.module.get_function ('test_coeff') 
     test_coeff(drv.Out(test), block=(2**self.size,1,1), grid=(128, 1)) 
     test_coeff(drv.Out(test2), block=(2**self.size,1,1), grid=(128, 1)) 
     error = (test-test2) 
     return error 

if __name__ == '__main__': 
    p1 = MC (5, np.float64) 
    err = p1.test(26) 
    print err.max() 
    print err.min()

基本上，在内核中的printf，则err为0 - 没有它，它打印一些随机误差（我周围2452的机器上（为最大），和-2583（最小））

我不知道为什么。

我上pycuda用的GeForce 570

由于运行CUDA 4.2 2012.2（64位Windows 7）。

来源

2012-10-07 user1726633

对不起，但我无法在64位Linux主机和GTX 670上使用CUDA 4.2重现此操作。单次和双精度版本每次在您发布内核时都使用它们运行它们。 – talonmies

我认为我的硬件有问题 - 尽管我不确定为什么4.2 GPU SDK中的所有其他cuda程序都能正常工作。我会尝试在linux中使用相同的硬件来运行这个 - 然后我会在Windows中尝试不同的硬件并看看。。。 – user1726633

我不知道pycuda，但在C/C++中，你不能在'__global__'或'__device__'代码中使用'printf'函数。 pycuda可能吗？ – szamil

这很可能是由于编译器优化。您正在将一段内存OUTPUT_SIZE长度设置为id的循环常量值。根据我的经验，编译器将优化到memcpy或whathaveyou 除非循环中还有其他事情正在进行 - 即您的打印语句。此外，如果你不使用那块内存，编译器可能会优化整个循环。尝试摆弄你的优化级别，看看你是否有不同的结果。

来源

2013-06-25 18:24:16 Ethereal

pycuda似乎不确定性

回答

相关问题