[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241218172110.12c4016a@elisabeth>
Date: Wed, 18 Dec 2024 17:21:10 +0100
From: Stefano Brivio <sbrivio@...hat.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>, Eric Dumazet
<edumazet@...gle.com>, netdev@...r.kernel.org, Kuniyuki Iwashima
<kuniyu@...zon.com>, Mike Manning <mvrmanning@...il.com>, David Gibson
<david@...son.dropbear.id.au>, Paul Holzinger <pholzing@...hat.com>, Philo
Lu <lulie@...ux.alibaba.com>, Cambda Zhu <cambda@...ux.alibaba.com>, Fred
Chen <fred.cc@...baba-inc.com>, Yubing Qiu
<yubing.qiuyubing@...baba-inc.com>
Subject: Re: [PATCH net-next 2/2] datagram, udp: Set local address and
rehash socket atomically against lookup
On Fri, 6 Dec 2024 14:35:35 +0100
Stefano Brivio <sbrivio@...hat.com> wrote:
> On Fri, 6 Dec 2024 13:36:47 +0100
> Paolo Abeni <pabeni@...hat.com> wrote:
>
> > On 12/6/24 11:50, Stefano Brivio wrote:
> > > On Thu, 5 Dec 2024 17:53:33 +0100 Paolo Abeni <pabeni@...hat.com> wrote:
> > >> I'm wondering if the issue could be solved (almost) entirely in the
> > >> rehash callback?!? if the rehash happens on connect and the the socket
> > >> does not have hash4 yet (it's not a reconnect) do the l4 hashing before
> > >> everything else.
> > >
> > > So, yes, that's actually the first thing I tried: do the hashing (any
> > > hash) before setting the address (I guess that's what you mean by
> > > "everything else").
> > >
> > > If you take this series, and drop the changes in __udp4_lib_lookup(), I
> > > guess that would match what you suggest.
> >
> > I mean something slightly different. Just to explain the idea something
> > alike the following (completely untested):
> >
> > ---
> > diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
> > index cc6d0bd7b0a9..e9cc6edbcdc6 100644
> > --- a/net/ipv4/datagram.c
> > +++ b/net/ipv4/datagram.c
> > @@ -61,6 +61,10 @@ int __ip4_datagram_connect(struct sock *sk, struct
> > sockaddr *uaddr, int addr_len
> > err = -EACCES;
> > goto out;
> > }
> > +
> > + sk->sk_state = TCP_ESTABLISHED;
> > + inet->inet_daddr = fl4->daddr;
> > + inet->inet_dport = usin->sin_port;
> > if (!inet->inet_saddr)
> > inet->inet_saddr = fl4->saddr; /* Update source address */
> > if (!inet->inet_rcv_saddr) {
> > @@ -68,10 +72,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
> > sockaddr *uaddr, int addr_len
> > if (sk->sk_prot->rehash)
> > sk->sk_prot->rehash(sk);
> > }
> > - inet->inet_daddr = fl4->daddr;
> > - inet->inet_dport = usin->sin_port;
> > reuseport_has_conns_set(sk);
> > - sk->sk_state = TCP_ESTABLISHED;
> > sk_set_txhash(sk);
> > atomic_set(&inet->inet_id, get_random_u16());
> >
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 6a01905d379f..c6c58b0a6b7b 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -2194,6 +2194,21 @@ void udp_lib_rehash(struct sock *sk, u16 newhash,
> > u16 newhash4)
> > if (rcu_access_pointer(sk->sk_reuseport_cb))
> > reuseport_detach_sock(sk);
> >
> > + if (sk->sk_state == TCP_ESTABLISHED && !udp_hashed4(sk)) {
> > + struct udp_hslot * hslot4 = udp_hashslot4(udptable, newhash4);
> > +
> > + udp_sk(sk)->udp_lrpa_hash = newhash4;
> > + spin_lock(&hslot4->lock);
> > + hlist_nulls_add_head_rcu(&udp_sk(sk)->udp_lrpa_node,
> > + &hslot4->nulls_head);
> > + hslot4->count++;
> > + spin_unlock(&hslot4->lock);
> > +
> > + spin_lock(&hslot2->lock);
> > + udp_hash4_inc(hslot2);
> > + spin_unlock(&hslot2->lock);
> > + }
> > +
> > if (hslot2 != nhslot2) {
> > spin_lock(&hslot2->lock);
> > hlist_del_init_rcu(&udp_sk(sk)->udp_portaddr_node);
> > ---
> >
> > Basically the idea is to leverage the hash4 - which should be not yet
> > initialized when rehash is invoked due to connect().
>
> That assumption seems to be correct from my tests.
...but that doesn't help in a general case, because we don't have a
wildcard lookup for four-tuple hashes, more on that below.
> > In such a case, before touching hash{,2}, do hash4.
>
> Brilliant, thanks. I'll give that a try.
It sounded like a nice idea and I actually tried quite hard, but it
can't work (so I'm posting a different/simpler fix), mostly for three
reasons (plus a bunch that would require sparse but doable changes):
1. we can't use four-tuple hashes on CONFIG_BASE_SMALL=y, and it would
be rather weird to leave it unfixed in that case (and, worse, to have
substantially different behaviours depending on CONFIG_BASE_SMALL).
At the same time, I see your point about it (from review to v4 of
the four-tuple hash series), and I don't feel like it's worth adding
it back also for CONFIG_BASE_SMALL.
I tried adding some special handling based on a similar concept that
wouldn't make struct udp_table bigger, but it's strictly more
complicated than the other fix I'm posting.
2. hash4_cnt is stored in the secondary hash slot, and I see why, but
that means that if the secondary hash doesn't match, we'll also fail
the lookup based on four-tuple hash.
We could introduce a special case in the lookup, perhaps as fallback
only, ignoring the result of udp_has_hash4(), but it looks rather
convoluted (especially compared to the fix I'm posting)
3. we would need another version of udp{4,6}_lib_lookup4() (or a branch
inside it), handling wildcard lookups like udp{4,6}_lib_lookup2()
does, and then call udp{4,6}_lib_lookup4() a second time with
INADDR_ANY / &in6addr_any, because we don't know if the receive
address changed yet, as we're performing the lookup.
So, instead, I'm resorting to the primary hash, as fallback only. If
what we need is a hash that doesn't include the address, such as an
"uninitialised" four-tuple hash, we can as well use the original hash
that doesn't include addresses by design.
--
Stefano
Powered by blists - more mailing lists