2011-08-15 152 views
1

使用cudaDeviceReset()计算后正常使用Matlab的GPU的方式?我无法在最新版本的Matlab中使用GPU计算,因为我的GPU不支持Compute Capability 1.3+,并且我不希望为Accelereyes Jacket使用像cudaMemGetInfo()这样的简单Cuda函数支付大量资金,或我简单的Cuda内核。由于CUcontext缓存,Matlab是否会导致Cuda泄漏内存?

从Matlab调用Cuda时,我发现了一些非常令人沮丧的行为。在Visual Studio 2008中,我编写了一个简单的DLL,它使用标准的MEX接口运行一个Cuda查询:设备上有多少RAM可用(清单1)。

// cudaMemoryCheck.cpp : Defines the exported functions for the DLL application. 

#include <mex.h> 
#include <cuda.h> 
#include <driver_types.h> 
#include <cuda_runtime_api.h> 

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) 
{ 
    size_t free = 0, total = 0; 
    cudaError_t result = cudaMemGetInfo(&free, &total); 

    mexPrintf("free memory in bytes %u (%u MB), total memory in bytes %u (%u MB). ", free, free/1024/1024, total, total/1024/1024); 

    if(total > 0) 
     mexPrintf("%2.2f%% free\n", (100.0*free)/total); 
    else 
     mexPrintf("\n"); 

    // this is the critical line! 
    cudaDeviceReset(); 
} 

我编译,我使用DEF文件导出mexFunction项目一个Win32 DLL(释放模式),并重新命名的DLL文件扩展名.mexw32。

当我从Matlab运行cudaMemoryCheck时,如果cudaDeviceReset()被注释掉,我发现我的GPU会泄漏内存。这里是我琐碎的Matlab代码(清单2):

addpath('C:\Users\admin\Documents\Visual Studio 2008\Projects\cudaMemoryCheck\Release') 

for i=1:20 
    clear mex 
    cudaMemoryCheck; 
end 

运行在Matlab这个功能,我看到:

free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free 

从MATLAB的输出是非常不同的,当cudaDeviceReset()被注释掉:

free memory in bytes 37019648 (35 MB), total memory in bytes 244776960 (233 MB). 15.12% free 
free memory in bytes 25092096 (23 MB), total memory in bytes 244776960 (233 MB). 10.25% free 
free memory in bytes 13549568 (12 MB), total memory in bytes 244776960 (233 MB). 5.54% free 
free memory in bytes 12107776 (11 MB), total memory in bytes 244776960 (233 MB). 4.95% free 
free memory in bytes 8568832 (8 MB), total memory in bytes 244776960 (233 MB). 3.50% free 
free memory in bytes 9617408 (9 MB), total memory in bytes 244776960 (233 MB). 3.93% free 
free memory in bytes 6078464 (5 MB), total memory in bytes 244776960 (233 MB). 2.48% free 
free memory in bytes 8044544 (7 MB), total memory in bytes 244776960 (233 MB). 3.29% free 
free memory in bytes 5816320 (5 MB), total memory in bytes 244776960 (233 MB). 2.38% free 
free memory in bytes 7520256 (7 MB), total memory in bytes 244776960 (233 MB). 3.07% free 
free memory in bytes 8830976 (8 MB), total memory in bytes 244776960 (233 MB). 3.61% free 
free memory in bytes 5292032 (5 MB), total memory in bytes 244776960 (233 MB). 2.16% free 
free memory in bytes 3407872 (3 MB), total memory in bytes 244776960 (233 MB). 1.39% free 
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB). 
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB). 
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB). 
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB). 
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB). 
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB). 
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB). 

所以我得出结论,即使我的MEX函数没有在GPU上分配内存,Cuda运行时API每次运行MEX函数时都会创建新的CUcontexts,并且它永远不会清除直到我关闭Matlab或我使用cudaDeviceReset()。尽管事实上我没有分配任何内容,但最终GPU耗尽内存!

我不喜欢使用cudaDeviceReset()。 API说:“函数cudaDeviceReset()将立即为调用线程的当前设备取消初始化上下文初始化”和“调用者有责任确保此函数在设备未被进程中的任何其他主机线程访问时叫做。”换句话说,使用cudaDeviceReset()可以立即终止其他GPU计算,而不会发出警告。我还没有找到任何经常使用cudaDeviceReset()的文档是正常的,所以我不想这样做。我会接受任何答案,证明使用cudaDeviceReset()是正常的和必需的。版本信息:NVIDIA GPU Computing Toolkit 4.0,Matlab 7.8.0(R2009a,32位),Windows 7 Enterprise SP1(64位),Nvidia Quadro NVS 420(最新的Nvidia驱动程序,270.81)。

我也可以在Windows XP(32位,SP3)上用GeForce 8400 GS,Matlab,Visual Studio和GPU Computing Toolkit重现这个问题。 deviceQuery.exe的

输出:

deviceQuery.exe Starting... 

CUDA Device Query (Runtime API) version (CUDART static linking) 

Found 2 CUDA Capable device(s) 

Device 0: "Quadro NVS 420" 
    CUDA Driver Version/Runtime Version   4.0/4.0 
    CUDA Capability Major/Minor version number: 1.1 
    Total amount of global memory:     233 MBytes (244776960 bytes) 
    (1) Multiprocessors x (8) CUDA Cores/MP:  8 CUDA Cores 
    GPU Clock Speed:        1.40 GHz 
    Memory Clock rate:        700.00 Mhz 
    Memory Bus Width:        64-bit 
    Max Texture Dimension Size (x,y,z)    1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048) 
    Max Layered Texture Size (dim) x layers  1D=(8192) x 512, 2D=(8192,8192) x 512 
    Total amount of constant memory:    65536 bytes 
    Total amount of shared memory per block:  16384 bytes 
    Total number of registers available per block: 8192 
    Warp size:          32 
    Maximum number of threads per block:   512 
    Maximum sizes of each dimension of a block: 512 x 512 x 64 
    Maximum sizes of each dimension of a grid:  65535 x 65535 x 1 
    Maximum memory pitch:       2147483647 bytes 
    Texture alignment:        256 bytes 
    Concurrent copy and execution:     No with 0 copy engine(s) 
    Run time limit on kernels:      Yes 
    Integrated GPU sharing Host Memory:   No 
    Support host page-locked memory mapping:  Yes 
    Concurrent kernel execution:     No 
    Alignment requirement for Surfaces:   Yes 
    Device has ECC support enabled:    No 
    Device is using TCC driver mode:    No 
    Device supports Unified Addressing (UVA):  No 
    Device PCI Bus ID/PCI location ID:   3/0 
    Compute Mode: 
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > 

Device 1: "Quadro NVS 420" 
    CUDA Driver Version/Runtime Version   4.0/4.0 
    CUDA Capability Major/Minor version number: 1.1 
    Total amount of global memory:     234 MBytes (244908032 bytes) 
    (1) Multiprocessors x (8) CUDA Cores/MP:  8 CUDA Cores 
    GPU Clock Speed:        1.40 GHz 
    Memory Clock rate:        700.00 Mhz 
    Memory Bus Width:        64-bit 
    Max Texture Dimension Size (x,y,z)    1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048) 
    Max Layered Texture Size (dim) x layers  1D=(8192) x 512, 2D=(8192,8192) x 512 
    Total amount of constant memory:    65536 bytes 
    Total amount of shared memory per block:  16384 bytes 
    Total number of registers available per block: 8192 
    Warp size:          32 
    Maximum number of threads per block:   512 
    Maximum sizes of each dimension of a block: 512 x 512 x 64 
    Maximum sizes of each dimension of a grid:  65535 x 65535 x 1 
    Maximum memory pitch:       2147483647 bytes 
    Texture alignment:        256 bytes 
    Concurrent copy and execution:     No with 0 copy engine(s) 
    Run time limit on kernels:      Yes 
    Integrated GPU sharing Host Memory:   No 
    Support host page-locked memory mapping:  Yes 
    Concurrent kernel execution:     No 
    Alignment requirement for Surfaces:   Yes 
    Device has ECC support enabled:    No 
    Device is using TCC driver mode:    No 
    Device supports Unified Addressing (UVA):  No 
    Device PCI Bus ID/PCI location ID:   4/0 
    Compute Mode: 
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 2, Device = Quadro NVS 420, Device = Quadro NVS 420 

回答

1

我不认为你应该需要使用cudaDeviceReset,如果省略调用clear mex会发生什么?你为什么首先这样做?这将导致MATLAB卸载您的MEX文件,我怀疑这是内存泄漏的根源。

+0

不调用清除mex确实会消除内存泄漏,但它不会告诉我为什么Matlab正在打开cuContexts。当DLL卸载时它们应该被销毁!即使使用mexAtExit也不能修复它。它看起来像Matlab进程本身必须退出来摧毁它们,这是令人沮丧的。 – user244795

+2

您是否尝试过运行'version -modules'来查看在调用'clear mex'后仍然在内存中有哪些DLL? – Edric

+0

@Edric:+1一个有用的(无证)功能,感谢分享此提示..可能对此[其他问题]有用(http://stackoverflow.com/questions/7012408/mex-function-not-updated-之后重新编译) – Amro