[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r0k82tbi.fsf@cloudflare.com>
Date: Wed, 29 Nov 2023 21:17:25 +0100
From: Jakub Sitnicki <jakub@...udflare.com>
To: David Laight <David.Laight@...LAB.COM>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, 'Jakub Kicinski'
<kuba@...nel.org>, "David S. Miller" <davem@...emloft.net>, Stephen
Hemminger <stephen@...workplumber.org>, Eric Dumazet
<edumazet@...gle.com>, 'David Ahern' <dsahern@...nel.org>, Paolo Abeni
<pabeni@...hat.com>
Subject: Re: [PATCH net-next] ipv4: Use READ/WRITE_ONCE() for IP
local_port_range
Hey David,
On Wed, Nov 29, 2023 at 07:26 PM GMT, David Laight wrote:
> Commit 227b60f5102cd added a seqlock to ensure that the low and high
> port numbers were always updated together.
> This is overkill because the two 16bit port numbers can be held in
> a u32 and read/written in a single instruction.
>
> More recently 91d0b78c5177f added support for finer per-socket limits.
> The user-supplied value is 'high << 16 | low' but they are held
> separately and the socket options protected by the socket lock.
>
> Use a u32 containing 'high << 16 | low' for both the 'net' and 'sk'
> fields and use READ_ONCE()/WRITE_ONCE() to ensure both values are
> always updated together.
>
> Change (the now trival) inet_get_local_port_range() to a static inline
> to optimise the calling code.
> (In particular avoiding returning integers by reference.)
>
> Signed-off-by: David Laight <david.laight@...lab.com>
> ---
Regarding the per-socket changes - we don't expect contention on sock
lock between inet_stream_connect / __inet_bind, where we grab it and
eventually call inet_sk_get_local_port_range, and sockopt handlers, do
we?
The motivation is not super clear for me for that of the changes.
> include/net/inet_sock.h | 5 +----
> include/net/ip.h | 7 ++++++-
> include/net/netns/ipv4.h | 3 +--
> net/ipv4/af_inet.c | 4 +---
> net/ipv4/inet_connection_sock.c | 29 ++++++++++------------------
> net/ipv4/ip_sockglue.c | 34 ++++++++++++++++-----------------
> net/ipv4/sysctl_net_ipv4.c | 12 ++++--------
> 7 files changed, 40 insertions(+), 54 deletions(-)
>
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 74db6d97cae1..ebf71410aa2b 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -234,10 +234,7 @@ struct inet_sock {
> int uc_index;
> int mc_index;
> __be32 mc_addr;
> - struct {
> - __u16 lo;
> - __u16 hi;
> - } local_port_range;
> + u32 local_port_range;
Nit: This field would benefit from a similar comment as you have added to
local_ports.range ("/* high << 16 | low */"), now that it is no longer
obvious how to interpret the contents.
>
> struct ip_mc_socklist __rcu *mc_list;
> struct inet_cork_full cork;
[...]
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 394a498c2823..1a45d41f8b39 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -117,34 +117,25 @@ bool inet_rcv_saddr_any(const struct sock *sk)
[...]
> void inet_sk_get_local_port_range(const struct sock *sk, int *low, int *high)
> {
> const struct inet_sock *inet = inet_sk(sk);
> const struct net *net = sock_net(sk);
> int lo, hi, sk_lo, sk_hi;
> + u32 sk_range;
>
> inet_get_local_port_range(net, &lo, &hi);
>
> - sk_lo = inet->local_port_range.lo;
> - sk_hi = inet->local_port_range.hi;
> + sk_range = READ_ONCE(inet->local_port_range);
> + if (unlikely(sk_range)) {
> + sk_lo = sk_range & 0xffff;
> + sk_hi = sk_range >> 16;
>
> - if (unlikely(lo <= sk_lo && sk_lo <= hi))
> - lo = sk_lo;
> - if (unlikely(lo <= sk_hi && sk_hi <= hi))
> - hi = sk_hi;
> + if (unlikely(lo <= sk_lo && sk_lo <= hi))
> + lo = sk_lo;
> + if (unlikely(lo <= sk_hi && sk_hi <= hi))
> + hi = sk_hi;
> + }
Actually when we know that sk_range is set, the above two branches
become likely. It will be usually so that the set per-socket port range
narrows down the per-netns port range.
These checks exist only in case the per-netns port range has been
reconfigured after per-socket port range has been set. The per-netns one
always takes precedence.
>
> *low = lo;
> *high = hi;
[...]
Powered by blists - more mailing lists