使用模板特征类型时，NVIDIA NVCC更改编译时间常数

使用C++模板时，我看到NVIDIA NVCC（CUDA 4.0和4.1测试过）的奇怪行为。我将它简化为一个演示行为的简单示例。使用模板特征类型时，NVIDIA NVCC更改编译时间常数

这已经处于错误报告状态。不过，我把它挂在这里，因为这个网站是一个越来越可靠的错误和修复的来源。所以，我保持这个页面更新。

代码：

#include"stdio.h" 

#define PETE_DEVICE __device__ 

template<class T, int N> class ILattice; 
template<class T>   class IScalar; 
template<class T, int IL> struct AddILattice {}; 

template<class T> 
PETE_DEVICE 
void printType() { 
    printf("%s\n",__PRETTY_FUNCTION__); 
} 

template<class T> class IScalar { 
    T F; 
}; 

template<class T, int N> class ILattice { 
    T F[N]; 
}; 

template<class T, int N> 
struct AddILattice<IScalar<T> , N> { 
    typedef ILattice< T , N > Type_t; 
}; 

#define IL 16 

__global__ void kernel() 
{ 
    printf("IL=%d\n",IL); // Here IL==16 

    typedef typename AddILattice<IScalar<float> ,IL>::Type_t Tnew; 

    // This still works fine. Output: 
    // void printType() [with T = ILattice<float, 16>] 
    // 
    printType<Tnew>(); 

    // Now problems begin: Output: 
    // T=4 Tnew=0 IL=64 
    // Here IL should still be 16 
    // sizeof(Tnew) should be 16*sizeof(float) 
    // 
    printf("T=%d Tnew=%d IL=%d\n",sizeof(IScalar<float>),sizeof(Tnew),IL); 
} 

int main() 
{ 
    dim3 blocksPerGrid(1 , 1 , 1); 
    dim3 threadsPerBlock(1 , 1, 1); 
    kernel<<< blocksPerGrid , threadsPerBlock , 48*1024 >>>(); 

    cudaDeviceSynchronize(); 
    cudaError_t kernel_call = cudaGetLastError(); 
    printf("call: %s\n",cudaGetErrorString(kernel_call)); 

}

为什么编译器改变从16到64 IL任何想法？

来源

2012-05-03 ritter

是不是标准:: COUT提供您的系统上？ – PlasmaHH

也许是因为您使用了错误的printf转换。 %d表示输出一个int，但sizeof不返回int，但是返回size_t。另外使用size_t length修饰符（并使其无符号），即用%zu代替%d。

printf无法知道（由于var-args列表）哪些类型真的被传递，所以没有类型转换发生，它只能知道格式字符串的类型。所以你必须在那里传递正确的参数。当您在size_t与int具有相同大小的系统上时，您的代码可以工作（例如，许多32位系统）。但是你不能依靠这个事实，而使用正确的转换会帮助你。

（所以它不是编译器改变你的不变，但是你只输出错）

来源

2012-05-03 09:54:19 flolo

这也可以解释为什么sizeof（Tnew）为0 ... – PlasmaHH

先生，有符号和无符号整数的位表示在范围0..std :: numeric_limits 中是相同的。只有在前导位之上被解释为整体减去。因此，在较低的范围内（这是我），使用％d是完全正确的。但是，对于迂腐用途你是对的。但是，这并不能解释行为。 – ritter

@Frank：不是签名与无签名有问题。它的int和size_t（它不是d和i的转换，它在它前面的z）。 – flolo

使用模板特征类型时，NVIDIA NVCC更改编译时间常数

回答

相关问题