2017-03-31 40 views
3

我使用GeForce GT 750M在我的Macbook Pro上安装了tensorflow 1.0.1 GPU版本。还安装了CUDA 8.0.71和cuDNN 5.1。我运行一个TF代码,与非CPU tensorflow但GPU版本的作品很好,我得到这个错误(一次,同时它也能工作):无法创建cudnn句柄:CUDNN_STATUS_INTERNAL_ERROR

name: GeForce GT 750M 
major: 3 minor: 0 memoryClockRate (GHz) 0.9255 
pciBusID 0000:01:00.0 
Total memory: 2.00GiB 
Free memory: 67.48MiB 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0) 
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 67.48M (70754304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY 
Training... 

E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
Abort trap: 6 

这到底是怎么回事?这是张量流中的一个错误吗?请帮忙。

这里有GPU的内存空间,当我运行的Python代码:

Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 83.477 of 2047.6 MB (i.e. 4.08%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 83.477 of 2047.6 MB (i.e. 4.08%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 83.477 of 2047.6 MB (i.e. 4.08%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 1.1016 of 2047.6 MB (i.e. 0.0538%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 1.1016 of 2047.6 MB (i.e. 0.0538%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 1.1016 of 2047.6 MB (i.e. 0.0538%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 1.1016 of 2047.6 MB (i.e. 0.0538%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 91.477 of 2047.6 MB (i.e. 4.47%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 22.852 of 2047.6 MB (i.e. 1.12%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 22.852 of 2047.6 MB (i.e. 1.12%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 36.121 of 2047.6 MB (i.e. 1.76%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 71.477 of 2047.6 MB (i.e. 3.49%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 67.477 of 2047.6 MB (i.e. 3.3%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 67.477 of 2047.6 MB (i.e. 3.3%) Free 
MacBook-Pro:cuda-smi-master xxxxxx$ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 67.477 of 2047.6 MB (i.e. 3.3%) Free 
+0

请发表您的NVIDIA GPU UTIL与记忆的数字。我猜你已经没有了GPU内存。 –

+0

我该如何检查?谢谢 – Shimano

+0

在Linux上我使用'nvidia-smi',但在macos上这不存在。试试这个:https://github.com/phvu/cuda-smi –

回答

2

我也得到同样的错误,我解决了问题。我的系统性能如下:

  • 操作系统:Ubuntu的14.04
  • GPU:GTX 1050Ti
  • NVIDIA驱动:375.66
  • Tensorflow:1.3.0
  • Cudnn:6.0.21(cudnn -8.0-Linux的x64的v6.0.deb)
  • Cuda的:8.0.61
  • Keras:2.0.8

我是如何解决这个问题如下:

  1. 我复制cudnn文件到适当的位置 (在/ usr /本地/ CUDA /包括在/ usr /本地/ CUDA/lib64下
  2. 我设置环境变量:

    * export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64" 
    * export CUDA_HOME=/usr/local/cuda 
    
  3. 我也跑sudo ldconfig -v command缓存的运行时链接共享库。

我希望这些步骤也能帮助那些即将发疯的人。

0

听起来很奇怪,请尝试重新启动计算机并重新运行模型。如果模型运行文件,则问题出在您的GPU内存分配和张量流管理该可用内存。在Windows 10我有两个终端打开和关闭一个解决了我的问题。有可能是开放的线程(僵尸),仍然拥有内存。

0

我设法得到它的工作在我的主文件夹删除文件夹.NV:

sudo rm -rf ~/.vn/