我在编写一个使用消息队列的软件。 我有一个问题:消息队列:接收错误
主进程创建16个儿子(与叉),每个儿子写下一个儿子的消息。然后,他们正在等待收到他们的消息。 (儿子“0”向儿子“1”发送消息,...,儿子“15”向儿子“0”发送消息)。
它大部分时间运行良好,但有时会发生奇怪的事情......尽管它是由相应的儿子发送的,但进程永远不会收到它的消息!我会说它在10次成功之后发生一次。
我已经能够写出一段代码有错误:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <termios.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
struct buf
{
long mtype;
int data[32];
};
int main(int arc, char** argv)
{
int son = 0;
int pid = 0;
struct buf msgbuf;
key_t key;
key = ftok(argv[0], 'O');
int qid = msgget(key, IPC_CREAT | 0666);
if(qid < 0)
{
printf("Error\n");
return -1;
}
//Creates 16 sons
for(int i = 0; i < 16; i++)
{
pid = i;
son = fork();
if(son == 0)
break;
}
if(son == 0)
{
msgbuf.mtype = ((pid + 1) % 16) + 1;
for(int i = 0; i < 32; i++)
msgbuf.data[i] = pid;
printf("Writing %d\n", ((pid + 1) % 16) + 1);
msgsnd(qid, &msgbuf, 32 * sizeof(int), IPC_NOWAIT);
printf("Waiting for %d\n", pid + 1);
msgrcv(qid, &msgbuf, 32 * sizeof(int), pid + 1, 0);
printf("Got %d\n", (int)msgbuf.mtype);
}
sleep(3);
printf("----- END -----\n");
msgctl(qid, IPC_RMID, NULL);
return 0;
}
所以,预期的行为是类似的东西:
Writing 2
Writing 3
Waiting for 1
Waiting for 2
Got 2
Writing 4
Waiting for 3
Got 3
Writing 5
Waiting for 4
Got 4
Writing 6
Waiting for 5
Got 5
Writing 7
Waiting for 6
Got 6
Writing 8
Waiting for 7
Got 7
Writing 9
Waiting for 8
Got 8
Writing 10
Waiting for 9
Got 9
Writing 11
Waiting for 10
Got 10
Writing 12
Waiting for 11
Got 11
Writing 13
Waiting for 12
Got 12
Writing 14
Waiting for 13
Got 13
Writing 15
Waiting for 14
Got 14
Writing 16
Waiting for 15
Got 15
Writing 1
Waiting for 16
Got 16
Got 1
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
但有时候,我有这样的事情:
Writing 2
Writing 3
Waiting for 1
Waiting for 2
Got 2
Writing 4
Waiting for 3
Got 3
Writing 5
Waiting for 4
Got 4
Writing 6
Waiting for 5
Got 5
Writing 7
Waiting for 6
Got 6
Writing 9
Waiting for 8
Writing 8
Waiting for 7
Got 7
Got 8
Writing 10
Waiting for 9
Got 9
Writing 11
Waiting for 10
Got 10
Writing 12
Waiting for 11
Got 11
Writing 13
Writing 14
Waiting for 12
Waiting for 13
Got 12
Writing 15
Waiting for 14
Got 14
Writing 16
Waiting for 15
Got 15
Writing 1
Waiting for 16
Got 16
Got 1
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
Got 14
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
----- END -----
正如你所看到的,消息“14”从来没有收到,3秒后,代码释放队列导致一个假的“得到14”。
在我真实的代码中,我使用信号量来确保程序只在每个人收到他的消息后退出。这意味着会发生死锁。事实上,这个信息永远不会被接收,信号量永远不会被“解锁”。所以这不是因为睡眠时间太短或类似的事情。这不是因为我之后删除队列。
但是不要忘记,大部分时间,这是OK!我不明白为什么有时儿子永远不会收到他的信息。
你能帮我吗?