lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 08 Mar 2024 09:37:14 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>, "David S . Miller"
 <davem@...emloft.net>,  Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, eric.dumazet@...il.com, Martin KaFai Lau
	 <kafai@...com>, Joe Stringer <joe@...d.net.nz>, Alexei Starovoitov
	 <ast@...nel.org>, Willem de Bruijn <willemdebruijn.kernel@...il.com>, 
	Kuniyuki Iwashima
	 <kuniyu@...zon.com>
Subject: Re: [PATCH net-next] udp: no longer touch sk->sk_refcnt in early
 demux

On Thu, 2024-03-07 at 22:00 +0000, Eric Dumazet wrote:
> After commits ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
> and 7ae215d23c12 ("bpf: Don't refcount LISTEN sockets in sk_assign()")
> UDP early demux no longer need to grab a refcount on the UDP socket.
> 
> This save two atomic operations per incoming packet for connected
> sockets.

This reminds me of a old series:

https://lore.kernel.org/netdev/cover.1506114055.git.pabeni@redhat.com/

and I'm wondering if we could reconsider such option.

> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Cc: Martin KaFai Lau <kafai@...com>
> Cc: Joe Stringer <joe@...d.net.nz>
> Cc: Alexei Starovoitov <ast@...nel.org>
> Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>
> Cc: Kuniyuki Iwashima <kuniyu@...zon.com>
> ---
>  net/ipv4/udp.c | 5 +++--
>  net/ipv6/udp.c | 5 +++--
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index a8acea17b4e5344d022ae8f8eb674d1a36f8035a..e43ad1d846bdc2ddf5767606b78bbd055f692aa8 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2570,11 +2570,12 @@ int udp_v4_early_demux(struct sk_buff *skb)
>  					     uh->source, iph->saddr, dif, sdif);
>  	}
>  
> -	if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt))
> +	if (!sk)
>  		return 0;
>  
>  	skb->sk = sk;
> -	skb->destructor = sock_efree;
> +	DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
> +	skb->destructor = sock_pfree;

I *think* that the skb may escape the current rcu section if e.g. if
matches a nf dup target in the input tables.

Back then I tried to implement some debug infra to track such accesses:

https://lore.kernel.org/lkml/cover.1507294365.git.pabeni@redhat.com/

which was buggy (prone to false negative). I think it can be improved
to something more reliable, perhaps I should revamp it?

I'm also wondering if the DEBUG_NET_WARN_ON_ONCE is worthy?!? the sk is
an hashed UDP socket so is a full sock and has the bit SOCK_RCU_FREE
set.

Perhaps we could use a simple 'noop' destructor as in:

https://lore.kernel.org/netdev/b16163e3a4fa4d772edeabd8743acb4a07206bb9.1506114055.git.pabeni@redhat.com/


Thanks!

Paolo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ