2017-09-19 93 views
6

这个问题在互联网上是asked before,但我找不到一个好的答案。为什么Linux内核具有`struct sock`和`struct socket`?

Linux内核网络栈设有两种结构:

的两个结构主要与,但似乎有一点不同的一生。可以通过sock->sk找到sk,或通过sk->sk_socket找到sock

为什么有两种结构来存储有关套接字的信息?假设我需要添加一个新字段,何时将它添加到struct socket以及什么时候struct sock

UPDATE:请注意,我指的是在struct socketinclude/linux/net.h Linux的源代码,这只是针对内核代码里面,/usr/include/sys/socket.h这是为用户空间。

+0

好:有一个内部和一个外面。 (类似的事情发生在struct stat中) – wildplasser

回答

4

struct socket似乎是用于系统调用的更高级别的接口(这就是为什么它也具有指向代表文件描述符的struct file的原因)。

struct sock是一个内核中的用于实行插座AF_INET(也有struct unix_sockAF_UNIX插座其衍生物的本),它可以通过两个内核和被用户空间被使用(通过struct sock)。

这两个都是在1993年被添加到Linux 1.0的,我怀疑你会找到一个指定初始设计决策的文档。

+0

Okey,但为什么内核开发者不是直接在'struct socket'中直接在'struct sock'和'struct unix_sock'中包含所有字段? – user1202136

+1

@ user1202136:因为它与[分解](https://en.wikipedia.org/wiki/Decomposition_(computer_science))相矛盾,这可以使事情变得更简单。另外请注意,'AF_INET'和'AF_UNIX'只有43中的2(参见'AF_MAX')可用套接字类型。在创建低层实现结构之前,'struct socket'由'socket()'创建。 – myaut

0

“这两个结构基本上是联系在一起的 - ”不确定你的意思。

我想你可以找到答案,如果查看源文件,这些结构:

socket -> linux-src/include/linux/net.h 
sock -> linux-src/include/net/sock.h 

插座

* NET  An implementation of the SOCKET network access protocol. 
*  This is the master header file for the Linux NET layer, 
*  or, in plain English: the networking handling part of the 
*  kernel. 

袜子

* INET  An implementation of the TCP/IP protocol suite for the LINUX 
*  operating system. INET is implemented using the BSD Socket 
*  interface as the means of communication with the user level. 

这些结构的迪菲租用并具有不同的套接字抽象表示。

这里回答关于不同的套接字。

Unix vs BSD vs TCP vs Internet sockets?

哪里定义附加字段取决于你的意图。请描述你的任务。

请看源代码。

的linux-SRC /在include/linux/net.h

/* 
* NET  An implementation of the SOCKET network access protocol. 
*  This is the master header file for the Linux NET layer, 
*  or, in plain English: the networking handling part of the 
*  kernel. 
* 
* Version: @(#)net.h 1.0.3 05/25/93 
* 
* Authors: Orest Zborowski, <[email protected]> 
*  Ross Biro 
*  Fred N. van Kempen, <[email protected]> 
* 
*  This program is free software; you can redistribute it and/or 
*  modify it under the terms of the GNU General Public License 
*  as published by the Free Software Foundation; either version 
*  2 of the License, or (at your option) any later version. 
*/ 
..... 
..... 
..... 
/** 
* struct socket - general BSD socket 
* @state: socket state (%SS_CONNECTED, etc) 
* @type: socket type (%SOCK_STREAM, etc) 
* @flags: socket flags (%SOCK_NOSPACE, etc) 
* @ops: protocol specific socket operations 
* @file: File back pointer for gc 
* @sk: internal networking protocol agnostic socket representation 
* @wq: wait queue for several uses 
*/ 
struct socket { 
    socket_state  state; 

    kmemcheck_bitfield_begin(type); 
    short   type; 
    kmemcheck_bitfield_end(type); 

    unsigned long  flags; 

    struct socket_wq __rcu *wq; 

    struct file  *file; 
    struct sock  *sk; 
    const struct proto_ops *ops; 
}; 

的linux-SRC /包括/净/ sock.h

/* 
* INET  An implementation of the TCP/IP protocol suite for the LINUX 
*  operating system. INET is implemented using the BSD Socket 
*  interface as the means of communication with the user level. 
* 
*  Definitions for the AF_INET socket handler. 
* 
* Version: @(#)sock.h 1.0.4 05/13/93 
* 
* Authors: Ross Biro 
*  Fred N. van Kempen, <[email protected]> 
*  Corey Minyard <[email protected]> 
*  Florian La Roche <[email protected]> 
* 
* Fixes: 
*  Alan Cox : Volatiles in skbuff pointers. See 
*     skbuff comments. May be overdone, 
*     better to prove they can be removed 
*     than the reverse. 
*  Alan Cox : Added a zapped field for tcp to note 
*     a socket is reset and must stay shut up 
*  Alan Cox : New fields for options 
* Pauline Middelink : identd support 
*  Alan Cox : Eliminate low level recv/recvfrom 
*  David S. Miller : New socket lookup architecture. 
*    Steve Whitehouse:  Default routines for sock_ops 
*    Arnaldo C. Melo : removed net_pinfo, tp_pinfo and made 
*       protinfo be just a void pointer, as the 
*       protocol specific parts were moved to 
*       respective headers and ipv4/v6, etc now 
*       use private slabcaches for its socks 
*    Pedro Hortas : New flags field for socket options 
* 
* 
*  This program is free software; you can redistribute it and/or 
*  modify it under the terms of the GNU General Public License 
*  as published by the Free Software Foundation; either version 
*  2 of the License, or (at your option) any later version. 
*/ 
.... 
.... 
.... 
/** 
    * struct sock - network layer representation of sockets 
    * @__sk_common: shared layout with inet_timewait_sock 
    * @sk_shutdown: mask of %SEND_SHUTDOWN and/or %RCV_SHUTDOWN 
    * @sk_userlocks: %SO_SNDBUF and %SO_RCVBUF settings 
    * @sk_lock: synchronizer 
    * @sk_kern_sock: True if sock is using kernel lock classes 
    * @sk_rcvbuf: size of receive buffer in bytes 
    * @sk_wq: sock wait queue and async head 
    * @sk_rx_dst: receive input route used by early demux 
    * @sk_dst_cache: destination cache 
    * @sk_dst_pending_confirm: need to confirm neighbour 
    * @sk_policy: flow policy 
    * @sk_receive_queue: incoming packets 
    * @sk_wmem_alloc: transmit queue bytes committed 
    * @sk_tsq_flags: TCP Small Queues flags 
    * @sk_write_queue: Packet sending queue 
    * @sk_omem_alloc: "o" is "option" or "other" 
    * @sk_wmem_queued: persistent queue size 
    * @sk_forward_alloc: space allocated forward 
    * @sk_napi_id: id of the last napi context to receive data for sk 
    * @sk_ll_usec: usecs to busypoll when there is no data 
    * @sk_allocation: allocation mode 
    * @sk_pacing_rate: Pacing rate (if supported by transport/packet scheduler) 
    * @sk_pacing_status: Pacing status (requested, handled by sch_fq) 
    * @sk_max_pacing_rate: Maximum pacing rate (%SO_MAX_PACING_RATE) 
    * @sk_sndbuf: size of send buffer in bytes 
    * @__sk_flags_offset: empty field used to determine location of bitfield 
    * @sk_padding: unused element for alignment 
    * @sk_no_check_tx: %SO_NO_CHECK setting, set checksum in TX packets 
    * @sk_no_check_rx: allow zero checksum in RX packets 
    * @sk_route_caps: route capabilities (e.g. %NETIF_F_TSO) 
    * @sk_route_nocaps: forbidden route capabilities (e.g NETIF_F_GSO_MASK) 
    * @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4) 
    * @sk_gso_max_size: Maximum GSO segment size to build 
    * @sk_gso_max_segs: Maximum number of GSO segments 
    * @sk_lingertime: %SO_LINGER l_linger setting 
    * @sk_backlog: always used with the per-socket spinlock held 
    * @sk_callback_lock: used with the callbacks in the end of this struct 
    * @sk_error_queue: rarely used 
    * @sk_prot_creator: sk_prot of original sock creator (see ipv6_setsockopt, 
    *   IPV6_ADDRFORM for instance) 
    * @sk_err: last error 
    * @sk_err_soft: errors that don't cause failure but are the cause of a 
    *   persistent failure not just 'timed out' 
    * @sk_drops: raw/udp drops counter 
    * @sk_ack_backlog: current listen backlog 
    * @sk_max_ack_backlog: listen backlog set in listen() 
    * @sk_uid: user id of owner 
    * @sk_priority: %SO_PRIORITY setting 
    * @sk_type: socket type (%SOCK_STREAM, etc) 
    * @sk_protocol: which protocol this socket belongs in this network family 
    * @sk_peer_pid: &struct pid for this socket's peer 
    * @sk_peer_cred: %SO_PEERCRED setting 
    * @sk_rcvlowat: %SO_RCVLOWAT setting 
    * @sk_rcvtimeo: %SO_RCVTIMEO setting 
    * @sk_sndtimeo: %SO_SNDTIMEO setting 
    * @sk_txhash: computed flow hash for use on transmit 
    * @sk_filter: socket filtering instructions 
    * @sk_timer: sock cleanup timer 
    * @sk_stamp: time stamp of last packet received 
    * @sk_tsflags: SO_TIMESTAMPING socket options 
    * @sk_tskey: counter to disambiguate concurrent tstamp requests 
    * @sk_zckey: counter to order MSG_ZEROCOPY notifications 
    * @sk_socket: Identd and reporting IO signals 
    * @sk_user_data: RPC layer private data 
    * @sk_frag: cached page frag 
    * @sk_peek_off: current peek_offset value 
    * @sk_send_head: front of stuff to transmit 
    * @sk_security: used by security modules 
    * @sk_mark: generic packet mark 
    * @sk_cgrp_data: cgroup data for this cgroup 
    * @sk_memcg: this socket's memory cgroup association 
    * @sk_write_pending: a write to stream socket waits to start 
    * @sk_state_change: callback to indicate change in the state of the sock 
    * @sk_data_ready: callback to indicate there is data to be processed 
    * @sk_write_space: callback to indicate there is bf sending space available 
    * @sk_error_report: callback to indicate errors (e.g. %MSG_ERRQUEUE) 
    * @sk_backlog_rcv: callback to process the backlog 
    * @sk_destruct: called at sock freeing time, i.e. when all refcnt == 0 
    * @sk_reuseport_cb: reuseport group container 
    * @sk_rcu: used during RCU grace period 
    */ 
struct sock { 
    /* 
    * Now struct inet_timewait_sock also uses sock_common, so please just 
    * don't add nothing before this first member (__sk_common) --acme 
    */ 
    struct sock_common __sk_common; 
#define sk_node   __sk_common.skc_node 
#define sk_nulls_node  __sk_common.skc_nulls_node 
#define sk_refcnt  __sk_common.skc_refcnt 
#define sk_tx_queue_mapping __sk_common.skc_tx_queue_mapping 

#define sk_dontcopy_begin __sk_common.skc_dontcopy_begin 
#define sk_dontcopy_end  __sk_common.skc_dontcopy_end 
#define sk_hash   __sk_common.skc_hash 
#define sk_portpair  __sk_common.skc_portpair 
#define sk_num   __sk_common.skc_num 
#define sk_dport  __sk_common.skc_dport 
#define sk_addrpair  __sk_common.skc_addrpair 
#define sk_daddr  __sk_common.skc_daddr 
#define sk_rcv_saddr  __sk_common.skc_rcv_saddr 
#define sk_family  __sk_common.skc_family 
#define sk_state  __sk_common.skc_state 
#define sk_reuse  __sk_common.skc_reuse 
#define sk_reuseport  __sk_common.skc_reuseport 
#define sk_ipv6only  __sk_common.skc_ipv6only 
#define sk_net_refcnt  __sk_common.skc_net_refcnt 
#define sk_bound_dev_if  __sk_common.skc_bound_dev_if 
#define sk_bind_node  __sk_common.skc_bind_node 
#define sk_prot   __sk_common.skc_prot 
#define sk_net   __sk_common.skc_net 
#define sk_v6_daddr  __sk_common.skc_v6_daddr 
#define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr 
#define sk_cookie  __sk_common.skc_cookie 
#define sk_incoming_cpu  __sk_common.skc_incoming_cpu 
#define sk_flags  __sk_common.skc_flags 
#define sk_rxhash  __sk_common.skc_rxhash 

    socket_lock_t  sk_lock; 
    atomic_t  sk_drops; 
    int   sk_rcvlowat; 
    struct sk_buff_head sk_error_queue; 
    struct sk_buff_head sk_receive_queue; 
    /* 
    * The backlog queue is special, it is always used with 
    * the per-socket spinlock held and requires low latency 
    * access. Therefore we special case it's implementation. 
    * Note : rmem_alloc is in this structure to fill a hole 
    * on 64bit arches, not because its logically part of 
    * backlog. 
    */ 
    struct { 
     atomic_t rmem_alloc; 
     int  len; 
     struct sk_buff *head; 
     struct sk_buff *tail; 
    } sk_backlog; 
#define sk_rmem_alloc sk_backlog.rmem_alloc 

    int   sk_forward_alloc; 
#ifdef CONFIG_NET_RX_BUSY_POLL 
    unsigned int  sk_ll_usec; 
    /* ===== mostly read cache line ===== */ 
    unsigned int  sk_napi_id; 
#endif 
    int   sk_rcvbuf; 

    struct sk_filter __rcu *sk_filter; 
    union { 
     struct socket_wq __rcu *sk_wq; 
     struct socket_wq *sk_wq_raw; 
    }; 
#ifdef CONFIG_XFRM 
    struct xfrm_policy __rcu *sk_policy[2]; 
#endif 
    struct dst_entry *sk_rx_dst; 
    struct dst_entry __rcu *sk_dst_cache; 
    atomic_t  sk_omem_alloc; 
    int   sk_sndbuf; 

    /* ===== cache line for TX ===== */ 
    int   sk_wmem_queued; 
    refcount_t  sk_wmem_alloc; 
    unsigned long  sk_tsq_flags; 
    struct sk_buff  *sk_send_head; 
    struct sk_buff_head sk_write_queue; 
    __s32   sk_peek_off; 
    int   sk_write_pending; 
    __u32   sk_dst_pending_confirm; 
    u32   sk_pacing_status; /* see enum sk_pacing */ 
    long   sk_sndtimeo; 
    struct timer_list sk_timer; 
    __u32   sk_priority; 
    __u32   sk_mark; 
    u32   sk_pacing_rate; /* bytes per second */ 
    u32   sk_max_pacing_rate; 
    struct page_frag sk_frag; 
    netdev_features_t sk_route_caps; 
    netdev_features_t sk_route_nocaps; 
    int   sk_gso_type; 
    unsigned int  sk_gso_max_size; 
    gfp_t   sk_allocation; 
    __u32   sk_txhash; 

    /* 
    * Because of non atomicity rules, all 
    * changes are protected by socket lock. 
    */ 
    unsigned int  __sk_flags_offset[0]; 
#ifdef __BIG_ENDIAN_BITFIELD 
#define SK_FL_PROTO_SHIFT 16 
#define SK_FL_PROTO_MASK 0x00ff0000 

#define SK_FL_TYPE_SHIFT 0 
#define SK_FL_TYPE_MASK 0x0000ffff 
#else 
#define SK_FL_PROTO_SHIFT 8 
#define SK_FL_PROTO_MASK 0x0000ff00 

#define SK_FL_TYPE_SHIFT 16 
#define SK_FL_TYPE_MASK 0xffff0000 
#endif 

    kmemcheck_bitfield_begin(flags); 
    unsigned int  sk_padding : 1, 
       sk_kern_sock : 1, 
       sk_no_check_tx : 1, 
       sk_no_check_rx : 1, 
       sk_userlocks : 4, 
       sk_protocol : 8, 
       sk_type  : 16; 
#define SK_PROTOCOL_MAX U8_MAX 
    kmemcheck_bitfield_end(flags); 

    u16   sk_gso_max_segs; 
    unsigned long   sk_lingertime; 
    struct proto  *sk_prot_creator; 
    rwlock_t  sk_callback_lock; 
    int   sk_err, 
       sk_err_soft; 
    u32   sk_ack_backlog; 
    u32   sk_max_ack_backlog; 
    kuid_t   sk_uid; 
    struct pid  *sk_peer_pid; 
    const struct cred *sk_peer_cred; 
    long   sk_rcvtimeo; 
    ktime_t   sk_stamp; 
    u16   sk_tsflags; 
    u8   sk_shutdown; 
    u32   sk_tskey; 
    atomic_t  sk_zckey; 
    struct socket  *sk_socket; 
    void   *sk_user_data; 
#ifdef CONFIG_SECURITY 
    void   *sk_security; 
#endif 
    struct sock_cgroup_data sk_cgrp_data; 
    struct mem_cgroup *sk_memcg; 
    void   (*sk_state_change)(struct sock *sk); 
    void   (*sk_data_ready)(struct sock *sk); 
    void   (*sk_write_space)(struct sock *sk); 
    void   (*sk_error_report)(struct sock *sk); 
    int   (*sk_backlog_rcv)(struct sock *sk, 
          struct sk_buff *skb); 
    void     (*sk_destruct)(struct sock *sk); 
    struct sock_reuseport __rcu *sk_reuseport_cb; 
    struct rcu_head  sk_rcu; 
}; 
+0

没有解释的复制粘贴源代码真的没有帮助。我没有具体的任务,只是想了解背后的逻辑。例如,为什么'struct sock'中的'sk_uid'而不是'struct socket'? – user1202136

相关问题