CUDA运行时gpu初始化与theano

我想并行我的神经网络跨两个GPU后https://github.com/uoguelph-mlrg/theano_multi_gpu。我有所有的依赖关系，但cuda运行时初始化失败并显示以下消息。CUDA运行时gpu初始化与theano

ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device 0 failed: 
cublasCreate() returned this error 'the CUDA Runtime initialization failed' 
Error when trying to find the memory information on the GPU: invalid device ordinal 
Error allocating 24 bytes of device memory (invalid device ordinal). Driver report 0 bytes free and 0 bytes total 
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed: 
CudaNdarray_ZEROS: allocation failed. 
Process Process-1: 
Traceback (most recent call last): 
    File "/opt/share/Python-2.7.9/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap 
    self.run() 
    File "/opt/share/Python-2.7.9/lib/python2.7/multiprocessing/process.py", line 114, in run 
    self._target(*self._args, **self._kwargs) 
    File "/u/bsankara/nt/Git-nt/nt/train_attention.py", line 171, in launch_train 
    clip_c=1.) 
    File "/u/bsankara/nt/Git-nt/nt/nt.py", line 1616, in train 
    import theano.sandbox.cuda 
    File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/__init__.py", line 98, in <module> 
    theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1() 
    File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/tests/test_driver.py", line 30, in test_nvidia_driver1 
    A = cuda.shared_constructor(a) 
    File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/var.py", line 181, in float32_shared_constructor 
    enable_cuda=False) 
    File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py", line 389, in use 
    cuda_ndarray.cuda_ndarray.CudaNdarray.zeros((2, 3)) 
RuntimeError: ('CudaNdarray_ZEROS: allocation failed.', 'You asked to force this device and it failed. No fallback to the cpu or other gpu device.')

的代码段的相关部分是在这里：当进口theano.sandbox.cuda被触发

from multiprocessing import Queue 
import zmq 
import pycuda.driver as drv 
import pycuda.gpuarray as gpuarray 

def train(private_args, process_env, <some other args>) 
    if process_env is not None: 
     os.environ = process_env 

    #### 
    # pycuda and zmq environment 

    drv.init() 
    dev = drv.Device(private_args['ind_gpu']) 
    ctx = dev.make_context() 
    sock = zmq.Context().socket(zmq.PAIR) 

    if private_args['flag_client']: 
     sock.connect('tcp://localhost:5000') 
    else: 
     sock.bind('tcp://*:5000') 

    #### 
    # import theano stuffs 
    import theano.sandbox.cuda 
    theano.sandbox.cuda.use(private_args['gpu']) 

    import theano 
    import theano.tensor as tensor 
    from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams 
    import theano.misc.pycuda_init 
    import theano.misc.pycuda_utils 
...

错误。在这里，我将训练功能作为两个过程来发挥作用。

def launch_train(curr_args, process_env, curr_queue, oth_queue): 
    trainerr, validerr, testerr = train(private_args=curr_args, 
             process_env=process_env, 
             ...) 

process1_env = os.environ.copy() 
process1_env['THEANO_FLAGS'] = "cuda.root=/opt/share/cuda-7.0,device=gpu0,floatX=float32,on_unused_input=ignore,optimizer=fast_run,exception_verbosity=high,compiledir=/u/bsankara/.theano/NT_multi_GPU1" 
process2_env = os.environ.copy() 
process2_env['THEANO_FLAGS'] = "cuda.root=/opt/share/cuda-7.0,device=gpu1,floatX=float32,on_unused_input=ignore,optimizer=fast_run,exception_verbosity=high,compiledir=/u/bsankara/.theano/NT_multi_GPU2" 

p = Process(target=launch_train, 
       args=(p_args, process1_env, queue_p, queue_q)) 
q = Process(target=launch_train, 
       args=(q_args, process2_env, queue_q, queue_p)) 

p.start() 
q.start() 
p.join() 
q.join()

但是，如果我尝试在Python中交互式地初始化gpu，导入语句似乎工作。我执行了火车的前20行（），它在那里工作得很好，并按我的要求正确地将我分配给了gpu0。

来源

2015-09-24 baskaran

我试着用pdb进行一些调试，它似乎在/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py文件中失败 'def use（device，force = False，default_to_move_computation_to_gpu = True，move_shared_float32_to_gpu = True，enable_cuda = True，test_driver = True）：' 特别是，它在命令'gpu_init（device）'中崩溃。 'device'具有'0'值，来自'gpu0'，并且失败并且消息： RuntimeError：“cublasCreate（）返回了此错误'CUDA运行时初始化失败'” – baskaran

'dual_mlp.py'代码（在你链接到的GitHub仓库中）不用修改就运行？您是否尝试回到关于此主题的原始/官方文档（https://github.com/Theano/Theano/wiki/Using-Multiple-GPUs）？ –

@Daniel，官方文档和dual_mlp.py人使用相同的方法。他们都启动子进程，然后导入'theano.sandbox.cuda'与gpu进行绑定。 AFAIK的唯一区别是dual_mlp.py使用PyCUDA函数进行GPU到GPU的传输，以避免通过主机内存进行隧道传输的延迟。官方文档，建议使用多处理队列。我没有尝试自己运行dual_mlp.py，但与其中一位作者进行了私人交流，他表示它对他们有效。会检查这一点。 – baskaran

挖掘并运行pdb后，原始海报发现问题。

基本上theano和pycuda都争夺初始化GPU，导致问题。解决方案是首先“导入theano”，这将得到一个GPU，然后附加到pycuda中的特定context。所以，train函数内进口的部分是这样的：

def train(private_args, process_env, <some other args>) 
    if process_env is not None: 
     os.environ = process_env 

    #### 
    # import theano related 
    # We need global imports and so we make them as such 
    theano = __import__('theano') 
    _t_tensor = __import__('theano', globals(), locals(), ['tensor'], -1) 
    tensor = _t_tensor.tensor 

    import theano.sandbox.cuda 
    import theano.misc.pycuda_utils 

    #### 
    # pycuda and zmq environment 
    import zmq 
    import pycuda.driver as drv 
    import pycuda.gpuarray as gpuarray 

    drv.init() 
    # Attach the existing context (already initialized by theano import statement) 
    ctx = drv.Context.attach() 
    sock = zmq.Context().socket(zmq.PAIR) 

    if private_args['flag_client']: 
     sock.connect('tcp://localhost:5000') 
    else: 
     sock.bind('tcp://*:5000')

[这个答案加入从由OP在试图让这个问题关闭unaswered列表中进行编辑社区维基条目。

来源

2016-06-23 09:52:09 talonmies

CUDA运行时gpu初始化与theano

回答

相关问题