2013-03-14 94 views
3

我在写一些C++ AMP代码时遇到了问题。我已经包含了一个样本。 它在模拟加速器上运行良好,但在我的硬件(Windows 7,NVIDIA GeForce GTX 660,最新驱动程序)上崩溃了显示驱动程序,但是我的代码没有看到任何问题。C++ AMP硬件崩溃(GeForce GTX 660)

我的代码有问题,还是硬件/驱动程序/编译器问题?

#include "stdafx.h" 

#include <vector> 
#include <iostream> 
#include <amp.h> 

int _tmain(int argc, _TCHAR* argv[]) 
{ 
    // Prints "NVIDIA GeForce GTX 660" 
    concurrency::accelerator_view target_view = concurrency::accelerator().create_view(); 
    std::wcout << target_view.accelerator.description << std::endl; 

    // lower numbers do not cause the issue 
    const int x = 2000; 
    const int y = 30000; 

    // 1d array for storing result 
    std::vector<unsigned int> resultVector(y); 
    Concurrency::array_view<unsigned int, 1> resultsArrayView(resultVector.size(), resultVector); 

    // 2d array for data for processing 
    std::vector<unsigned int> dataVector(x * y); 
    concurrency::array_view<unsigned int, 2> dataArrayView(y, x, dataVector); 
    parallel_for_each(
     // Define the compute domain, which is the set of threads that are created. 
     resultsArrayView.extent, 
     // Define the code to run on each thread on the accelerator. 
     [=](concurrency::index<1> idx) restrict(amp) 
    { 
     concurrency::array_view<unsigned int, 1> buffer = dataArrayView[idx[0]]; 
     unsigned int bufferSize = buffer.get_extent().size(); 

     // needs both loops to cause crash 
     for (unsigned int outer = 0; outer < bufferSize; outer++) 
     { 
      for (unsigned int i = 0; i < bufferSize; i++) 
      { 
       // works without this line, also if I change to buffer[0] it works? 
       dataArrayView[idx[0]][0] = 0; 
      } 
     } 
     // works without this line 
     resultsArrayView[0] = 0; 
    }); 

    std::cout << "chash on next line" << std::endl; 
    resultsArrayView.synchronize(); 
    std::cout << "will never reach me" << std::endl; 

    system("PAUSE"); 
    return 0; 
} 

回答

7

这很可能是您的计算超过了允许的量子时间(默认2秒)。在此之后,操作系统进入并重新启动GPU,这被称为Timeout Detection and Recovery (TDR)。软件适配器(参考设备)没有启用TDR,这就是计算可能超过允许的量子时间的原因。

您的计算是否真的需要3000个线程(变量x),每个线程执行2000 * 3000(x * y)循环迭代?你可以将你的计算分块,这样每个块的计算时间少于2秒。您还可以考虑禁用TDR或超出允许的量子时间以适应您的需求。

我强烈建议你阅读如何在C++ AMP,这说明在细节TDR处理存托凭证博客文章:http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/07/handling-tdrs-in-c-amp.aspx

此外,这里是如何禁用在Windows 8的TDR独立的博客文章: http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/06/disabling-tdr-on-windows-8-for-your-c-amp-algorithms.aspx

+0

非常感谢你,我开始因此而失去理智。我从来不知道这个TDR存在。我已经更新了它,现在它可以工作。谢谢你的惊人答案! – 2013-03-15 19:17:11