[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLRDpAmaJVYCf+-F7mTTVkxSJMKfxZ+QhB8ATzYEi4X8g@mail.gmail.com>
Date: Wed, 26 Jul 2023 16:21:40 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: David Laight <David.Laight@...lab.com>
Cc: "willemdebruijn.kernel@...il.com" <willemdebruijn.kernel@...il.com>,
"davem@...emloft.net" <davem@...emloft.net>, "dsahern@...nel.org" <dsahern@...nel.org>,
"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH 2/2] Rescan the hash2 list if the hash chains have got cross-linked.
On Wed, Jul 26, 2023 at 4:13 PM David Laight <David.Laight@...lab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 26 July 2023 14:37
> >
> > On Wed, Jul 26, 2023 at 2:06 PM David Laight <David.Laight@...lab.com> wrote:
> > >
> > > udp_lib_rehash() can get called at any time and will move a
> > > socket to a different hash2 chain.
> > > This can cause udp4_lib_lookup2() (processing incoming UDP) to
> > > fail to find a socket and an ICMP port unreachable be sent.
> > >
> > > Prior to ca065d0cf80fa the lookup used 'hlist_nulls' and checked
> > > that the 'end if list' marker was on the correct list.
> > >
> > > Rather than re-instate the 'nulls' list just check that the final
> > > socket is on the correct list.
> > >
> > > The cross-linking can definitely happen (see earlier issues with
> > > it looping forever because gcc cached the list head).
> > >
> > > Fixes: ca065d0cf80fa ("udp: no longer use SLAB_DESTROY_BY_RCU")
> > > Signed-off-by: David Laight <david.laight@...lab.com>
> > > ---
> >
> > Hi David, thanks a lot for the investigations.
> >
> > I do not think this is the proper fix.
> >
> > UDP rehash has always been buggy, because we lack an rcu grace period
> > between the removal of the socket
> > from the old hash bucket to the new one.
> >
> > We need to stuff a synchronize_rcu() somewhere in udp_lib_rehash(),
> > and that might not be easy [1]
> > and might add unexpected latency to some real time applications.
> > ([1] : Not sure if we are allowed to sleep in udp_lib_rehash())
>
> I'm also not sure that the callers would always like the potentially
> long rcu sleep.
>
> > Also note that adding a synchronize_rcu() would mean the socket would
> > not be found anyway by some incoming packets.
> >
> > I think that rehash is tricky to implement if you expect that all
> > incoming packets must find the socket, wherever it is located.
>
> I thought about something like the checks done for reading
> multi-word counters.
> Something like requiring the updater to increment a count on
> entry and exit and have the reader rescan with the lock held
> if the count is odd or changes.
>
Can you describe what user space operation is done by your precious application,
triggering a rehash in the first place ?
Maybe we can think of something less disruptive in the kernel.
(For instance, you could have a second socket, insert it in the new bucket,
then remove the old socket)
> The problem is that a single 'port unreachable' can be treated
> as a fatal error by the receiving application.
> So you really don't want to be sending them.
Well, if your application needs to run with old kernels, and or
transient netfilter changes (some firewall setups do not use
iptables-restore)
better be more resilient to transient ICMP messages anyway.
>
> >
> >
> > > net/ipv4/udp.c | 14 ++++++++++++++
> > > 1 file changed, 14 insertions(+)
> > >
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index ad64d6c4cd99..ed92ba7610b0 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -443,6 +443,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
> > > struct sk_buff *skb)
> > > {
> > > unsigned int hash2, slot2;
> > > + unsigned int hash2_rescan;
> > > struct udp_hslot *hslot2;
> > > struct sock *sk, *result;
> > > int score, badness;
> > > @@ -451,9 +452,12 @@ static struct sock *udp4_lib_lookup2(struct net *net,
> > > slot2 = hash2 & udptable->mask;
> > > hslot2 = &udptable->hash2[slot2];
> > >
> > > +rescan:
> > > + hash2_rescan = hash2;
> > > result = NULL;
> > > badness = 0;
> > > udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
> > > + hash2_rescan = udp_sk(sk)->udp_portaddr_hash;
> > > score = compute_score(sk, net, saddr, sport,
> > > daddr, hnum, dif, sdif);
> > > if (score > badness) {
> > > @@ -467,6 +471,16 @@ static struct sock *udp4_lib_lookup2(struct net *net,
> > > badness = score;
> > > }
> > > }
> > > +
> > > + /* udp sockets can get moved to a different hash chain.
> > > + * If the chains have got crossed then rescan.
> > > + */
> > > + if ((hash2_rescan & udptable->mask) != slot2) {
> >
> > This is only going to catch one of the possible cases.
> >
> > If we really want to add extra checks in this fast path, we would have
> > to check all found sockets,
> > not only the last one.
>
> I did think about that.
> Being hit by a single rehash is very unlikely.
> Being hit by two that also put you back onto the original
> chain really isn't going to happen.
>
> Putting the check inside the loop would save the test when the
> hash list is empty - probably common for the first lookup.
>
> The code in compute_score() is pretty horrid so maybe it
> wouldn't really be noticeable.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Powered by blists - more mailing lists