netdev - Re: [PATCH net-next 2/2] datagram, udp: Set local address and rehash socket atomically against lookup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241218172110.12c4016a@elisabeth>
Date: Wed, 18 Dec 2024 17:21:10 +0100
From: Stefano Brivio <sbrivio@...hat.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>, Eric Dumazet
 <edumazet@...gle.com>, netdev@...r.kernel.org, Kuniyuki Iwashima
 <kuniyu@...zon.com>, Mike Manning <mvrmanning@...il.com>, David Gibson
 <david@...son.dropbear.id.au>, Paul Holzinger <pholzing@...hat.com>, Philo
 Lu <lulie@...ux.alibaba.com>, Cambda Zhu <cambda@...ux.alibaba.com>, Fred
 Chen <fred.cc@...baba-inc.com>, Yubing Qiu
 <yubing.qiuyubing@...baba-inc.com>
Subject: Re: [PATCH net-next 2/2] datagram, udp: Set local address and
 rehash socket atomically against lookup

On Fri, 6 Dec 2024 14:35:35 +0100
Stefano Brivio <sbrivio@...hat.com> wrote:

> On Fri, 6 Dec 2024 13:36:47 +0100
> Paolo Abeni <pabeni@...hat.com> wrote:
> 
> > On 12/6/24 11:50, Stefano Brivio wrote:  
> > > On Thu, 5 Dec 2024 17:53:33 +0100 Paolo Abeni <pabeni@...hat.com> wrote:    
> > >> I'm wondering if the issue could be solved (almost) entirely in the
> > >> rehash callback?!? if the rehash happens on connect and the the socket
> > >> does not have hash4 yet (it's not a reconnect) do the l4 hashing before
> > >> everything else.    
> > > 
> > > So, yes, that's actually the first thing I tried: do the hashing (any
> > > hash) before setting the address (I guess that's what you mean by
> > > "everything else").
> > > 
> > > If you take this series, and drop the changes in __udp4_lib_lookup(), I
> > > guess that would match what you suggest.    
> > 
> > I mean something slightly different. Just to explain the idea something
> > alike the following (completely untested):
> > 
> > ---
> > diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
> > index cc6d0bd7b0a9..e9cc6edbcdc6 100644
> > --- a/net/ipv4/datagram.c
> > +++ b/net/ipv4/datagram.c
> > @@ -61,6 +61,10 @@ int __ip4_datagram_connect(struct sock *sk, struct
> > sockaddr *uaddr, int addr_len
> >  		err = -EACCES;
> >  		goto out;
> >  	}
> > +
> > +	sk->sk_state = TCP_ESTABLISHED;
> > +	inet->inet_daddr = fl4->daddr;
> > +	inet->inet_dport = usin->sin_port;
> >  	if (!inet->inet_saddr)
> >  		inet->inet_saddr = fl4->saddr;	/* Update source address */
> >  	if (!inet->inet_rcv_saddr) {
> > @@ -68,10 +72,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
> > sockaddr *uaddr, int addr_len
> >  		if (sk->sk_prot->rehash)
> >  			sk->sk_prot->rehash(sk);
> >  	}
> > -	inet->inet_daddr = fl4->daddr;
> > -	inet->inet_dport = usin->sin_port;
> >  	reuseport_has_conns_set(sk);
> > -	sk->sk_state = TCP_ESTABLISHED;
> >  	sk_set_txhash(sk);
> >  	atomic_set(&inet->inet_id, get_random_u16());
> > 
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 6a01905d379f..c6c58b0a6b7b 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -2194,6 +2194,21 @@ void udp_lib_rehash(struct sock *sk, u16 newhash,
> > u16 newhash4)
> >  			if (rcu_access_pointer(sk->sk_reuseport_cb))
> >  				reuseport_detach_sock(sk);
> > 
> > +			if (sk->sk_state == TCP_ESTABLISHED && !udp_hashed4(sk)) {
> > +				struct udp_hslot * hslot4 = udp_hashslot4(udptable, newhash4);
> > +
> > +				udp_sk(sk)->udp_lrpa_hash = newhash4;
> > +				spin_lock(&hslot4->lock);
> > +				hlist_nulls_add_head_rcu(&udp_sk(sk)->udp_lrpa_node,
> > +							 &hslot4->nulls_head);
> > +				hslot4->count++;
> > +				spin_unlock(&hslot4->lock);
> > +
> > +				spin_lock(&hslot2->lock);
> > +				udp_hash4_inc(hslot2);
> > +				spin_unlock(&hslot2->lock);
> > +			}
> > +
> >  			if (hslot2 != nhslot2) {
> >  				spin_lock(&hslot2->lock);
> >  				hlist_del_init_rcu(&udp_sk(sk)->udp_portaddr_node);
> > ---
> > 
> > Basically the idea is to leverage the hash4 - which should be not yet
> > initialized when rehash is invoked due to connect().  
> 
> That assumption seems to be correct from my tests.

...but that doesn't help in a general case, because we don't have a
wildcard lookup for four-tuple hashes, more on that below.

> > In such a case, before touching hash{,2}, do hash4.  
> 
> Brilliant, thanks. I'll give that a try.

It sounded like a nice idea and I actually tried quite hard, but it
can't work (so I'm posting a different/simpler fix), mostly for three
reasons (plus a bunch that would require sparse but doable changes):

1. we can't use four-tuple hashes on CONFIG_BASE_SMALL=y, and it would
   be rather weird to leave it unfixed in that case (and, worse, to have
   substantially different behaviours depending on CONFIG_BASE_SMALL).

   At the same time, I see your point about it (from review to v4 of
   the four-tuple hash series), and I don't feel like it's worth adding
   it back also for CONFIG_BASE_SMALL.

   I tried adding some special handling based on a similar concept that
   wouldn't make struct udp_table bigger, but it's strictly more
   complicated than the other fix I'm posting.

2. hash4_cnt is stored in the secondary hash slot, and I see why, but
   that means that if the secondary hash doesn't match, we'll also fail
   the lookup based on four-tuple hash.

   We could introduce a special case in the lookup, perhaps as fallback
   only, ignoring the result of udp_has_hash4(), but it looks rather
   convoluted (especially compared to the fix I'm posting)

3. we would need another version of udp{4,6}_lib_lookup4() (or a branch
   inside it), handling wildcard lookups like udp{4,6}_lib_lookup2()
   does, and then call udp{4,6}_lib_lookup4() a second time with
   INADDR_ANY / &in6addr_any, because we don't know if the receive
   address changed yet, as we're performing the lookup.

So, instead, I'm resorting to the primary hash, as fallback only. If
what we need is a hash that doesn't include the address, such as an
"uninitialised" four-tuple hash, we can as well use the original hash
that doesn't include addresses by design.

-- 
Stefano