为'MSG_MORE`标记的数据包刷新内核的TCP缓冲区

send（）的man page显示MSG_MORE标志，声明其行为如TCP_CORK。我身边有send()包装函数：为'MSG_MORE`标记的数据包刷新内核的TCP缓冲区

int SocketConnection_Write(SocketConnection *this, void *buf, int len) { 
    errno = 0; 

    int sent = send(this->fd, buf, len, MSG_NOSIGNAL); 

    if (errno == EPIPE || errno == ENOTCONN) { 
     throw(exc, &SocketConnection_NotConnectedException); 
    } else if (errno == ECONNRESET) { 
     throw(exc, &SocketConnection_ConnectionResetException); 
    } else if (sent != len) { 
     throw(exc, &SocketConnection_LengthMismatchException); 
    } 

    return sent; 
}

假设我想使用的内核缓冲区，我可以去TCP_CORK，能够在必要时再禁用它刷新缓冲区。但另一方面，由此产生对额外系统调用的需要。因此，使用MSG_MORE似乎更适合我。我会简单地改变上述的send（）行：

int sent = send(this->fd, buf, len, MSG_NOSIGNAL | MSG_MORE);

据lwm.net，数据包将被自动冲洗，如果他们有足够大：

如果应用程序设置上一个该选项socket，内核将不会发送短数据包。相反，它将等待，直到有足够的数据显示填满最大尺寸数据包，然后发送它。当TCP_CORK关闭时，任何剩余数据将在接线上熄灭。

但是本节只涉及TCP_CORK。现在，冲洗MSG_MORE数据包的正确方法是什么？

我只能想到两个可能性：作为this页面描述

通话发（）用一个空的缓冲区并没有MSG_MORE被设置
重新应用TCP_CORK选项

不幸的是，整个话题记录很差，我在互联网上找不到太多东西。

我也想知道如何检查一切按预期工作？通过strace显然运行服务器不是一个选项。所以最简单的方法是使用netcat然后看看它的strace输出？或者内核处理通过回送接口传输的流量是不同的？

来源

2010-03-30 user206268

sendfile（）保留'MSG_MORE'标志。当sendfile（）返回时，缓存将被刷新。 – user206268 2010-03-31 14:20:40

我看了一下内核源码，两个假设似乎都是真的。以下代码摘自net/ipv4/tcp.c（2.6.33.1）。

static inline void tcp_push(struct sock *sk, int flags, int mss_now, 
       int nonagle) 
{ 
    struct tcp_sock *tp = tcp_sk(sk); 

    if (tcp_send_head(sk)) { 
     struct sk_buff *skb = tcp_write_queue_tail(sk); 
     if (!(flags & MSG_MORE) || forced_push(tp)) 
      tcp_mark_push(tp, skb); 
     tcp_mark_urg(tp, flags, skb); 
     __tcp_push_pending_frames(sk, mss_now, 
         (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle); 
    } 
}

因此，如果标志是不组，因此处理中的帧一定会被冲洗。但是，这是只是情况下，当缓冲区不为空：

static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset, 
      size_t psize, int flags) 
{ 
(...) 
    ssize_t copied; 
(...) 
    copied = 0; 

    while (psize > 0) { 
(...) 
     if (forced_push(tp)) { 
      tcp_mark_push(tp, skb); 
      __tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH); 
     } else if (skb == tcp_send_head(sk)) 
      tcp_push_one(sk, mss_now); 
     continue; 

wait_for_sndbuf: 
     set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); 
wait_for_memory: 
     if (copied) 
      tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH); 

     if ((err = sk_stream_wait_memory(sk, &timeo)) != 0) 
      goto do_error; 

     mss_now = tcp_send_mss(sk, &size_goal, flags); 
    } 

out: 
    if (copied) 
     tcp_push(sk, flags, mss_now, tp->nonagle); 
    return copied; 

do_error: 
    if (copied) 
     goto out; 
out_err: 
    return sk_stream_error(sk, flags, err); 
}

的while循环体将永远不会被执行，因为psize不大于0。然后，在out部分，另外还有一个机会， tcp_push()被调用，但因为copied仍然有其默认值，所以它也会失败。

因此，发送长度为0的数据包永远不会导致刷新。

下一个理论是重新申请TCP_CORK。让我们先来看看代码：

static int do_tcp_setsockopt(struct sock *sk, int level, 
     int optname, char __user *optval, unsigned int optlen) 
{ 

(...) 

    switch (optname) { 
(...) 

    case TCP_NODELAY: 
     if (val) { 
      /* TCP_NODELAY is weaker than TCP_CORK, so that 
      * this option on corked socket is remembered, but 
      * it is not activated until cork is cleared. 
      * 
      * However, when TCP_NODELAY is set we make 
      * an explicit push, which overrides even TCP_CORK 
      * for currently queued segments. 
      */ 
      tp->nonagle |= TCP_NAGLE_OFF|TCP_NAGLE_PUSH; 
      tcp_push_pending_frames(sk); 
     } else { 
      tp->nonagle &= ~TCP_NAGLE_OFF; 
     } 
     break; 

    case TCP_CORK: 
     /* When set indicates to always queue non-full frames. 
     * Later the user clears this option and we transmit 
     * any pending partial frames in the queue. This is 
     * meant to be used alongside sendfile() to get properly 
     * filled frames when the user (for example) must write 
     * out headers with a write() call first and then use 
     * sendfile to send out the data parts. 
     * 
     * TCP_CORK can be set together with TCP_NODELAY and it is 
     * stronger than TCP_NODELAY. 
     */ 
     if (val) { 
      tp->nonagle |= TCP_NAGLE_CORK; 
     } else { 
      tp->nonagle &= ~TCP_NAGLE_CORK; 
      if (tp->nonagle&TCP_NAGLE_OFF) 
       tp->nonagle |= TCP_NAGLE_PUSH; 
      tcp_push_pending_frames(sk); 
     } 
     break; 
(...)

正如你所看到的，有两种方法可以刷新。您可以将TCP_NODELAY设置为1或TCP_CORK设置为0.幸运的是，两者都不会检查标志是否已设置。因此，我最初的计划重新应用TCP_CORK标志可以优化，只是禁用它，即使它目前没有设置。

我希望这可以帮助有类似问题的人。

来源

2010-03-31 15:16:00 user206268

感谢您的研究。很有帮助。 – bvanderveen 2010-08-28 20:48:45

这是一个很大的研究......我能提供的是这种经验后注：

发送一串与MSG_MORE组分组的，然后是没有MSG_MORE包，整幅熄灭。它的工作方式对待这样的事情：

for (i=0; i<mg_live.length; i++) { 
     // [...] 
     if ((n = pth_send(sock, query, len, MSG_MORE | MSG_NOSIGNAL)) < len) { 
      printf("error writing to socket (sent %i bytes of %i)\n", n, len); 
      exit(1); 
     } 
    } 
    } 

    pth_send(sock, "END\n", 4, MSG_NOSIGNAL);

也就是说，当你一次发送的所有数据包，并有明确的结束......你只使用一个插座。

如果您尝试在上面的循环中写入另一个套接字，您可能会发现Linux释放之前保存的数据包。至少这似乎是我现在遇到的麻烦。但它对你来说可能是一个简单的解决方案。

来源

2011-08-24 17:29:38 Orwellophile

为'MSG_MORE`标记的数据包刷新内核的TCP缓冲区

回答

相关问题