lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJqLU6RuHgdbz3iGNL_K8XaPBYr3pWqQmgth2TFf14obg@mail.gmail.com>
Date: Fri, 6 Dec 2024 10:04:33 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: David Gibson <david@...son.dropbear.id.au>
Cc: Stefano Brivio <sbrivio@...hat.com>, Willem de Bruijn <willemdebruijn.kernel@...il.com>, 
	netdev@...r.kernel.org, Kuniyuki Iwashima <kuniyu@...zon.com>, 
	Mike Manning <mvrmanning@...il.com>, Paul Holzinger <pholzing@...hat.com>, 
	Philo Lu <lulie@...ux.alibaba.com>, Cambda Zhu <cambda@...ux.alibaba.com>, 
	Fred Chen <fred.cc@...baba-inc.com>, Yubing Qiu <yubing.qiuyubing@...baba-inc.com>
Subject: Re: [PATCH net-next 2/2] datagram, udp: Set local address and rehash
 socket atomically against lookup

On Fri, Dec 6, 2024 at 3:16 AM David Gibson <david@...son.dropbear.id.au> wrote:
>
> On Thu, Dec 05, 2024 at 11:52:38PM +0100, Eric Dumazet wrote:
> > On Thu, Dec 5, 2024 at 11:32 PM David Gibson
> > <david@...son.dropbear.id.au> wrote:
> > >
> > > On Thu, Dec 05, 2024 at 05:35:52PM +0100, Eric Dumazet wrote:
> > > > On Wed, Dec 4, 2024 at 11:12 PM Stefano Brivio <sbrivio@...hat.com> wrote:
> > > [snip]
> > > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > > > index 6a01905d379f..8490408f6009 100644
> > > > > --- a/net/ipv4/udp.c
> > > > > +++ b/net/ipv4/udp.c
> > > > > @@ -639,18 +639,21 @@ struct sock *__udp4_lib_lookup(const struct net *net, __be32 saddr,
> > > > >                 int sdif, struct udp_table *udptable, struct sk_buff *skb)
> > > > >  {
> > > > >         unsigned short hnum = ntohs(dport);
> > > > > -       struct udp_hslot *hslot2;
> > > > > +       struct udp_hslot *hslot, *hslot2;
> > > > >         struct sock *result, *sk;
> > > > >         unsigned int hash2;
> > > > >
> > > > > +       hslot = udp_hashslot(udptable, net, hnum);
> > > > > +       spin_lock_bh(&hslot->lock);
> > > >
> > > > This is not acceptable.
> > > > UDP is best effort, packets can be dropped.
> > > > Please fix user application expectations.
> > >
> > > The packets aren't merely dropped, they're rejected with an ICMP Port
> > > Unreachable.
> >
> > We made UDP stack scalable with RCU, it took years of work.
> >
> > And this patch is bringing back the UDP stack to horrible performance
> > from more than a decade ago.
> > Everybody will go back to DPDK.
>
> It's reasonable to be concerned about the performance impact.  But
> this seems like preamture hyperbole given no-one has numbers yet, or
> has even suggested a specific benchmark to reveal the impact.
>
> > I am pretty certain this can be solved without using a spinlock in the
> > fast path.
>
> Quite possibly.  But Stefano has tried, and it certainly wasn't
> trivial.
>
> > Think about UDP DNS/QUIC servers, using SO_REUSEPORT and receiving
> > 10,000,000 packets per second....
> >
> > Changing source address on an UDP socket is highly unusual, we are not
> > going to slow down UDP for this case.
>
> Changing in a general way is very rare, one specific case is not.
> Every time you connect() a socket that wasn't previously bound to a
> specific address you get an implicit source address change from
> 0.0.0.0 or :: to something that depends on the routing table.
>
> > Application could instead open another socket, and would probably work
> > on old linux versions.
>
> Possibly there's a procedure that would work here, but it's not at all
> obvious:
>
>  * Clearly, you can't close the non-connected socket before opening
>    the connected one - that just introduces a new much wider race.  It
>    doesn't even get rid of the existing one, because unless you can
>    independently predict what the correct bound address will be
>    for a given peer address, the second socket will still have an
>    address change when you connect().
>

The order is kind of obvious.

Kernel does not have to deal with wrong application design.

>  * So, you must create the connected socket before closing the
>    unconnected one, meaning you have to use SO_REUSEADDR or
>    SO_REUSEPORT whether or not you otherwise wanted to.
>
>  * While both sockets are open, you need to handle the possibility
>    that packets could be delivered to either one.  Doable, but a pain
>    in the arse.

Given UDP does not have a proper listen() + accept() model, I am
afraid this is the only way

You need to keep the generic UDP socket as a catch all, and deal with
packets received on it.

>
>  * How do you know when the transition is completed and you can close
>    the unconnected socket?  The fact that the rehashing has completed
>    and all the necessary memory barriers passed isn't something
>    userspace can directly discern.
>
> > If the regression was recent, this would be considered as a normal regression,
> > but apparently nobody noticed for 10 years. This should be saying something...
>
> It does.  But so does the fact that it can be trivially reproduced.

If a kernel fix is doable without making UDP stack a complete nogo for
most of us,
I will be happy to review it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ