[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eea2a2c3-79dc-131c-4ef5-ee027b30b701@gmail.com>
Date: Wed, 22 Apr 2020 20:01:10 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Cambda Zhu <cambda@...ux.alibaba.com>,
netdev <netdev@...r.kernel.org>
Cc: Dust Li <dust.li@...ux.alibaba.com>,
Tony Lu <tonylu@...ux.alibaba.com>
Subject: Re: [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
On 4/21/20 5:17 AM, Cambda Zhu wrote:
> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
> option has same behavior as TCP_LINGER2, except the tp->linger2 value
> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
> with CAP_NET_ADMIN.
>
> As a server, different sockets may need different FIN-WAIT timeout and
> in most cases the system default value will be used. The timeout can
> be adjusted by setting TCP_LINGER2 but cannot be greater than the
> system default value. If one socket needs a timeout greater than the
> default, we have to adjust the sysctl which affects all sockets using
> the system default value. And if we want to adjust it for just one
> socket and keep the original value for others, all the other sockets
> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
> set greater tp->linger2 than the default for one socket and keep
> the sysctl_tcp_fin_timeout unchanged.
>
> Signed-off-by: Cambda Zhu <cambda@...ux.alibaba.com>
> ---
> include/uapi/linux/capability.h | 1 +
> include/uapi/linux/tcp.h | 1 +
> net/ipv4/tcp.c | 9 +++++++++
> 3 files changed, 11 insertions(+)
>
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index 272dc69fa080..0e30c9756a04 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
> /* Allow multicasting */
> /* Allow read/write of device-specific registers */
> /* Allow activation of ATM control sockets */
> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>
> #define CAP_NET_ADMIN 12
>
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index f2acb2566333..e21e0ce98ca1 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -128,6 +128,7 @@ enum {
> #define TCP_CM_INQ TCP_INQ
>
> #define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */
> +#define TCP_FORCE_LINGER2 38 /* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>
>
> #define TCP_REPAIR_ON 1
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6d87de434377..898a675d863e 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3149,6 +3149,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
> tcp_enable_tx_delay();
> tp->tcp_tx_delay = val;
> break;
> + case TCP_FORCE_LINGER2:
> + if (val < 0)
> + tp->linger2 = -1;
> + else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
> + !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> + tp->linger2 = 0;
> + else
> + tp->linger2 = val * HZ;
This multiply could overflow.
Since tp->linger2 is an int, and a negative value has a specific meaning,
you probably should have some sanity checks.
Even if the old TCP_LINGER2 silently put a 0,
maybe a new option should return an error if val*HZ would overflow.
> + break;
> default:
> err = -ENOPROTOOPT;
> break;
>
Powered by blists - more mailing lists