2015-11-22 26 views
0

我是新的信号量,并希望添加多线程到我的程序,但我无法解决以下问题:sem_wait()应该能够接收EINTR和解锁,只要我didn不设置SA_RESTART标志。我发送一个SIGUSR1给在sem_wait()中被阻塞的工作者线程,它接收到信号并被中断,但是它将继续阻塞,所以它永远不会给我一个-1返回码和errno = EINTR 。但是,如果我从主线程执行sem_post,它会解除阻塞,给我一个EINTR的错误,但是RC为0.我对这种行为感到十分困惑。这是一些奇怪的NetBSD实现还是我在这里做错了什么?根据手册页,sem_wait符合POSIX.1(ISO/IEC 9945-1:1996)。一个简单的代码:sem_wait不解锁与EINTR

#include <stdio.h> 
#include <stdlib.h> 
#include <errno.h> 
#include <signal.h> 
#include <pthread.h> 
#include <semaphore.h> 

typedef struct workQueue_s 
{ 
    int full; 
    int empty; 
    sem_t work; 
    int sock_c[10]; 
} workQueue_t; 

void signal_handler(int sig) 
{ 
    switch(sig) 
    { 
     case SIGUSR1: 
     printf("Signal: I am pthread %p\n", pthread_self()); 
     break; 
    } 
} 

extern int errno; 
workQueue_t queue; 
pthread_t workerbees[8]; 

void *BeeWork(void *t) 
{ 
    int RC; 
    pthread_t tid; 
    struct sigaction sa; 
    sa.sa_handler = signal_handler; 
    sigaction(SIGUSR1, &sa, NULL); 

    printf("Bee: I am pthread %p\n", pthread_self()); 
    RC = sem_wait(&queue.work); 
    printf("Bee: got RC = %d and errno = %d\n", RC, errno); 

    RC = sem_wait(&queue.work); 
    printf("Bee: got RC = %d and errno = %d\n", RC, errno); 
    pthread_exit((void *) t); 
} 

int main() 
{ 
    int RC; 
    long tid = 0; 
    pthread_attr_t attr; 
    pthread_attr_init(&attr); 
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); 

    queue.full = 0; 
    queue.empty = 0; 
    sem_init(&queue.work, 0, 0); 

    printf("I am pthread %p\n", pthread_self()); 
    pthread_create(&workerbees[tid], &attr, BeeWork, (void *) tid); 
    pthread_attr_destroy(&attr); 

    sleep(2); 
    sem_post(&queue.work); 
    sleep(2); 
    pthread_kill(workerbees[tid], SIGUSR1); 
    sleep(2); 

    // Remove this and sem_wait will stay blocked 
    sem_post(&queue.work); 
    sleep(2); 
    return(0); 
} 

我知道的printf是不出声的信号处理程序,但只为它赫克,如果我删除它,我得到了相同的结果。

这些人是sem_post结果:

I am pthread 0x7f7fffc00000 
Bee: I am pthread 0x7f7ff6c00000 
Bee: got RC = 0 and errno = 0 
Signal: I am pthread 0x7f7ff6c00000 

并与sem_post:

I am pthread 0x7f7fffc00000 
Bee: I am pthread 0x7f7ff6c00000 
Bee: got RC = 0 and errno = 0 
Signal: I am pthread 0x7f7ff6c00000 
Bee: got RC = 0 and errno = 4 

我知道我并不真的需要解锁并可以简单地做一个退出为主,但无论如何,我想看到它工作。我使用sem_wait的原因是因为我希望保持工作线程活着,并且一旦有来自Postfix的新客户端连接,就用sem_post从主线程中等待最长的工作线程。我不想一直执行pthread_create,因为我会每秒接收多次呼叫,并且我不想失去速度,并且使Postfix无法响应新的smtpd客户端。这是Postfix的一个policydaemon,服务器很忙。

我在这里错过了什么吗? NetBSD刚刚搞砸了吗?

+0

会出现这种情况,如果你正确使用的sigaction?现在你将大量垃圾传递给了sigaction(),也许你得到了SA_RESTART标志集。您绝对需要初始化您的'struct sigaction sa;',或者执行'struct sigaction sa = {0};''或'memset(&sa,0,sizeof sa);' – nos

+0

感谢您的提示,我得到了相同的结果...... – Saskia

+0

至少在NetBSD 7.0 amd64上可以正常工作,并且我得到了'Bee:得到了RC = -1和errno = 4'(注意,你应该删除'extern int errno',声明errno就是这样,在多线程程序中是错误的) – nos

回答

0

我的帖子是关于在Linux上的行为,但我认为你可能有类似的行为,或者至少我认为可能会有所帮助。如果没有,让我知道,我会删除这个无用的'噪音'。

我试图重现您的设置,我很惊讶地看到您描述的情况。深入探索帮助我发现实际上有些东西更加微妙;如果你看看与strace,你会看到类似的财产以后:

[pid 6984] futex(0x6020e8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> 
[pid 6983] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 
[pid 6983] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 
[pid 6983] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 
[pid 6983] nanosleep({2, 0}, 0x7fffe5794a70) = 0 
[pid 6983] tgkill(6983, 6984, SIGUSR1 <unfinished ...> 
[pid 6984] <... futex resumed>)  = ? ERESTARTSYS (To be restarted if SA_RESTART is set) 
[pid 6983] <... tgkill resumed>)  = 0 
[pid 6984] --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=6983, si_uid=500} --- 
[pid 6983] rt_sigprocmask(SIG_BLOCK, [CHLD], <unfinished ...> 
[pid 6984] rt_sigreturn(<unfinished ...> 
[pid 6983] <... rt_sigprocmask resumed> [], 8) = 0 
[pid 6984] <... rt_sigreturn resumed>) = -1 EINTR (Interrupted system call) 

看到ERESTARTSYS和线EINTR:被打断的SISTEM调用实际上是rt_sigreturn resumed,不futex(系统调用的sem_wait底层)如你所料。 我必须说,我很疑惑,但读书的人给了一些有趣的线索(男子7信号):

If a blocked call to one of the following interfaces is interrupted by 
    a signal handler, then the call will be automatically restarted after 
    the signal handler returns if the SA_RESTART flag was used; otherwise 
    the call will fail with the error EINTR: 
[...] 

     * futex(2) FUTEX_WAIT (since Linux 2.6.22; beforehand, always 
     failed with EINTR). 

所以我猜你有,有一个类似的行为内核(?见NetBSD的文档),您可以观察系统调用会自动重启,而没有任何机会看到它。

这么说,我完全从你的程序中删除的sem_post(),只是发送信号到“破”的sem_wait()ANS看strace的,我看到(蜂线程上过滤):

[pid 8309] futex(0x7fffc0470990, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> 
[pid 8309] <... futex resumed>)  = ? ERESTARTSYS (To be restarted if SA_RESTART is set) 
[pid 8309] --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=8308, si_uid=500} --- 
[pid 8309] rt_sigreturn()    = -1 EINTR (Interrupted system call) 
[pid 8309] madvise(0x7fd5f6019000, 8368128, MADV_DONTNEED) = 0 
[pid 8309] _exit(0) 

我必须说,我不掌握细节,但是内核似乎找出我试图站立和使整个事情有正确的行为:

Bee: got RC = -1 and errno = Interrupted system call 
+0

感谢您的帮助,我只有ktrace,当我删除最后一个sem_post并设置最后一次睡眠时间稍长时,我得到: 10631 1 a.out RET __nanosleep50 0 10631 1 a.out CALL exit(0 ) 10631 2 a.out RET ___lwp_park50 -1 errno 4中断的系统调用 – Saskia

0

谢谢您的回答OznOg,如果我删除最后一个sem_post并使最后一次睡眠更长一点,我得到这与ktrace:

PSIG SIGUSR1 caught handler=0x40035c mask=(): code=SI_LWP sent by pid=10631, uid=0) 
CALL write(1,0x7f7ff7e04000,0x24) 
GIO fd 1 wrote 36 bytes "Signal: I am pthread 0x7f7ff7800000\n" 
RET write 36/0x24 
CALL setcontext(0x7f7ff7bff970) 
RET setcontext JUSTRETURN 
CALL ___lwp_park50(0,0,0x7f7ff7e01100,0x7f7ff7e01100) 
RET __nanosleep50 0 
CALL exit(0) 
RET ___lwp_park50 -1 errno 4 Interrupted system call 

好像sem_wait只能由一个退出或sem_post返回....