2012-06-27 23 views
0

我的基本问题是关于压缩文件如何在valgrind中工作。我已经看了好多指向上使用MPI版本以下的文档> 1.5(我的是1.6):有人可以解释这个valgrind错误与开放mpi?

mpirun -np 2 valgrind --suppressions=/usr/share/openmpi/openmpi-valgrind.supp --track-origins=yes ./myprog 

然而,当我运行像这样的文件有600错误! 我得到的错误是这两个一遍又一遍。我不知道如何用我目前对valgrind和mpi的理解来解释其中之一。

==8821== Address 0xad5e4d7 is 87 bytes inside a block of size 128 alloc'd 
==8821== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) 
==8821== by 0x6348C52: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0) 
==8821== by 0x6349AF1: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0) 
==8821== by 0x6349B81: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0) 
==8821== by 0x7DA5B9C: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so) 
==8821== by 0x7DA52F4: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so) 
==8821== by 0x5082AF2: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.2) 
==8821== by 0x50A33FA: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.0.0.2) 
==8821== by 0x408AB5: main (test_send-receive.cpp:8) 
==8821== Uninitialised value was created by a heap allocation 
==8821== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) 
==8821== by 0x635FE2B: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0) 
==8821== by 0x6360634: opal_ifcount (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0) 
==8821== by 0x81B36AA: ??? (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so) 
==8821== by 0x5C01EE2: mca_oob_base_init (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0) 
==8821== by 0x7FA97FB: ??? (in /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so) 
==8821== by 0x5C083E4: orte_rml_base_select (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0) 
==8821== by 0x5BF5EC4: orte_ess_base_app_setup (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0) 
==8821== by 0x7BA1EAE: ??? (in /usr/lib/openmpi/lib/openmpi/mca_ess_env.so) 
==8821== by 0x5BDDB72: orte_init (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0) 
==8821== by 0x50822E0: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.2) 
==8821== by 0x50A33FA: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.0.0.2) 

产生这些错误代码是:

int main(int argc, char *argv[]) { 

    /* init MPI */ 
    MPI_Init(&argc, &argv); 

    int myid; 
    MPI_Comm_rank(MPI_COMM_WORLD, &myid); 
    int i; 
    if(myid == 0){ 
    double * d = new double [10]; 
    for(i = 0; i<10; i++){ 
     d[i] = i + 1.0; 
    } 
    MPI_Send(d, 
      10, 
     MPI_DOUBLE, 
     1, 
     1, 
     MPI_COMM_WORLD); 
    delete[] d; 
    } else { 
    MPI_Status status; 
    double * c = new double [10]; 
    MPI_Recv(c, 
     10, 
     MPI_DOUBLE, 
      0, 
     MPI_ANY_TAG, 
     MPI_COMM_WORLD, 
     &status); 

    for(i = 0; i<10; i++){ 
     printf("%f\n", c[i]); 
    } 
    delete[] c; 
    } 
    MPI_Finalize(); 
    return 0; 
    } 

此外,该代码运行得很好,并输出预期的结果。我误解了数据是如何通过网络发送的,还是有其他事情在这里发生,我不明白?

对不起,关于帖子的长度,你们摇滚甚至读这个。

+0

当代码采用myid == 0路径或“其他”路径,或者它们在两种情况下都发生时,是否发生valgrind错误? –

+0

此外,您没有检查MPI_Recv()调用(或“状态”变量)的返回值以查看MPI_Recv()是否成功。因此,可能MPI_Recv()出于某种原因失败,因此不会将任何数据写入(c)数组,这会在以后发生的printf()调用中导致未初始化的内存读取错误。只是一个猜测。 –

+0

@JeremyFriesner,测试返回值是**没有必要**,除非已经更改了通信器的错误处理程序。如果操作返回非MPI_SUCCESS(但对MPI I/O操作无效),默认的标准错误处理程序会中止应用程序。 –

回答

相关问题