lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r0k82tbi.fsf@cloudflare.com>
Date: Wed, 29 Nov 2023 21:17:25 +0100
From: Jakub Sitnicki <jakub@...udflare.com>
To: David Laight <David.Laight@...LAB.COM>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, 'Jakub Kicinski'
 <kuba@...nel.org>, "David S. Miller" <davem@...emloft.net>, Stephen
 Hemminger <stephen@...workplumber.org>, Eric Dumazet
 <edumazet@...gle.com>, 'David Ahern' <dsahern@...nel.org>, Paolo Abeni
 <pabeni@...hat.com>
Subject: Re: [PATCH net-next] ipv4: Use READ/WRITE_ONCE() for IP
 local_port_range

Hey David,

On Wed, Nov 29, 2023 at 07:26 PM GMT, David Laight wrote:
> Commit 227b60f5102cd added a seqlock to ensure that the low and high
> port numbers were always updated together.
> This is overkill because the two 16bit port numbers can be held in
> a u32 and read/written in a single instruction.
>
> More recently 91d0b78c5177f added support for finer per-socket limits.
> The user-supplied value is 'high << 16 | low' but they are held
> separately and the socket options protected by the socket lock.
>
> Use a u32 containing 'high << 16 | low' for both the 'net' and 'sk'
> fields and use READ_ONCE()/WRITE_ONCE() to ensure both values are
> always updated together.
>
> Change (the now trival) inet_get_local_port_range() to a static inline
> to optimise the calling code.
> (In particular avoiding returning integers by reference.)
>
> Signed-off-by: David Laight <david.laight@...lab.com>
> ---

Regarding the per-socket changes - we don't expect contention on sock
lock between inet_stream_connect / __inet_bind, where we grab it and
eventually call inet_sk_get_local_port_range, and sockopt handlers, do
we?

The motivation is not super clear for me for that of the changes.

>  include/net/inet_sock.h         |  5 +----
>  include/net/ip.h                |  7 ++++++-
>  include/net/netns/ipv4.h        |  3 +--
>  net/ipv4/af_inet.c              |  4 +---
>  net/ipv4/inet_connection_sock.c | 29 ++++++++++------------------
>  net/ipv4/ip_sockglue.c          | 34 ++++++++++++++++-----------------
>  net/ipv4/sysctl_net_ipv4.c      | 12 ++++--------
>  7 files changed, 40 insertions(+), 54 deletions(-)
>
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 74db6d97cae1..ebf71410aa2b 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -234,10 +234,7 @@ struct inet_sock {
>  	int			uc_index;
>  	int			mc_index;
>  	__be32			mc_addr;
> -	struct {
> -		__u16 lo;
> -		__u16 hi;
> -	}			local_port_range;
> +	u32			local_port_range;

Nit: This field would benefit from a similar comment as you have added to
local_ports.range ("/* high << 16 | low */"), now that it is no longer
obvious how to interpret the contents.

>  
>  	struct ip_mc_socklist __rcu	*mc_list;
>  	struct inet_cork_full	cork;

[...]

> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 394a498c2823..1a45d41f8b39 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -117,34 +117,25 @@ bool inet_rcv_saddr_any(const struct sock *sk)

[...]

>  void inet_sk_get_local_port_range(const struct sock *sk, int *low, int *high)
>  {
>  	const struct inet_sock *inet = inet_sk(sk);
>  	const struct net *net = sock_net(sk);
>  	int lo, hi, sk_lo, sk_hi;
> +	u32 sk_range;
>  
>  	inet_get_local_port_range(net, &lo, &hi);
>  
> -	sk_lo = inet->local_port_range.lo;
> -	sk_hi = inet->local_port_range.hi;
> +	sk_range = READ_ONCE(inet->local_port_range);
> +	if (unlikely(sk_range)) {
> +		sk_lo = sk_range & 0xffff;
> +		sk_hi = sk_range >> 16;
>  
> -	if (unlikely(lo <= sk_lo && sk_lo <= hi))
> -		lo = sk_lo;
> -	if (unlikely(lo <= sk_hi && sk_hi <= hi))
> -		hi = sk_hi;
> +		if (unlikely(lo <= sk_lo && sk_lo <= hi))
> +			lo = sk_lo;
> +		if (unlikely(lo <= sk_hi && sk_hi <= hi))
> +			hi = sk_hi;
> +	}

Actually when we know that sk_range is set, the above two branches
become likely. It will be usually so that the set per-socket port range
narrows down the per-netns port range.

These checks exist only in case the per-netns port range has been
reconfigured after per-socket port range has been set. The per-netns one
always takes precedence.

>  
>  	*low = lo;
>  	*high = hi;

[...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ