lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z1JeePBN5f1YCmYd@zatzit>
Date: Fri, 6 Dec 2024 13:16:24 +1100
From: David Gibson <david@...son.dropbear.id.au>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Stefano Brivio <sbrivio@...hat.com>,
	Willem de Bruijn <willemdebruijn.kernel@...il.com>,
	netdev@...r.kernel.org, Kuniyuki Iwashima <kuniyu@...zon.com>,
	Mike Manning <mvrmanning@...il.com>,
	Paul Holzinger <pholzing@...hat.com>,
	Philo Lu <lulie@...ux.alibaba.com>,
	Cambda Zhu <cambda@...ux.alibaba.com>,
	Fred Chen <fred.cc@...baba-inc.com>,
	Yubing Qiu <yubing.qiuyubing@...baba-inc.com>
Subject: Re: [PATCH net-next 2/2] datagram, udp: Set local address and rehash
 socket atomically against lookup

On Thu, Dec 05, 2024 at 11:52:38PM +0100, Eric Dumazet wrote:
> On Thu, Dec 5, 2024 at 11:32 PM David Gibson
> <david@...son.dropbear.id.au> wrote:
> >
> > On Thu, Dec 05, 2024 at 05:35:52PM +0100, Eric Dumazet wrote:
> > > On Wed, Dec 4, 2024 at 11:12 PM Stefano Brivio <sbrivio@...hat.com> wrote:
> > [snip]
> > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > > index 6a01905d379f..8490408f6009 100644
> > > > --- a/net/ipv4/udp.c
> > > > +++ b/net/ipv4/udp.c
> > > > @@ -639,18 +639,21 @@ struct sock *__udp4_lib_lookup(const struct net *net, __be32 saddr,
> > > >                 int sdif, struct udp_table *udptable, struct sk_buff *skb)
> > > >  {
> > > >         unsigned short hnum = ntohs(dport);
> > > > -       struct udp_hslot *hslot2;
> > > > +       struct udp_hslot *hslot, *hslot2;
> > > >         struct sock *result, *sk;
> > > >         unsigned int hash2;
> > > >
> > > > +       hslot = udp_hashslot(udptable, net, hnum);
> > > > +       spin_lock_bh(&hslot->lock);
> > >
> > > This is not acceptable.
> > > UDP is best effort, packets can be dropped.
> > > Please fix user application expectations.
> >
> > The packets aren't merely dropped, they're rejected with an ICMP Port
> > Unreachable.
> 
> We made UDP stack scalable with RCU, it took years of work.
> 
> And this patch is bringing back the UDP stack to horrible performance
> from more than a decade ago.
> Everybody will go back to DPDK.

It's reasonable to be concerned about the performance impact.  But
this seems like preamture hyperbole given no-one has numbers yet, or
has even suggested a specific benchmark to reveal the impact.

> I am pretty certain this can be solved without using a spinlock in the
> fast path.

Quite possibly.  But Stefano has tried, and it certainly wasn't
trivial.

> Think about UDP DNS/QUIC servers, using SO_REUSEPORT and receiving
> 10,000,000 packets per second....
> 
> Changing source address on an UDP socket is highly unusual, we are not
> going to slow down UDP for this case.

Changing in a general way is very rare, one specific case is not.
Every time you connect() a socket that wasn't previously bound to a
specific address you get an implicit source address change from
0.0.0.0 or :: to something that depends on the routing table.

> Application could instead open another socket, and would probably work
> on old linux versions.

Possibly there's a procedure that would work here, but it's not at all
obvious:

 * Clearly, you can't close the non-connected socket before opening
   the connected one - that just introduces a new much wider race.  It
   doesn't even get rid of the existing one, because unless you can
   independently predict what the correct bound address will be
   for a given peer address, the second socket will still have an
   address change when you connect().

 * So, you must create the connected socket before closing the
   unconnected one, meaning you have to use SO_REUSEADDR or
   SO_REUSEPORT whether or not you otherwise wanted to.

 * While both sockets are open, you need to handle the possibility
   that packets could be delivered to either one.  Doable, but a pain
   in the arse.

 * How do you know when the transition is completed and you can close
   the unconnected socket?  The fact that the rehashing has completed
   and all the necessary memory barriers passed isn't something
   userspace can directly discern.

> If the regression was recent, this would be considered as a normal regression,
> but apparently nobody noticed for 10 years. This should be saying something...

It does.  But so does the fact that it can be trivially reproduced.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ