netdev - Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160324165047.GA7585@1wt.eu>
Date:	Thu, 24 Mar 2016 17:50:47 +0100
From:	Willy Tarreau <w@....eu>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Tolga Ceylan <tolga.ceylan@...il.com>,
	Tom Herbert <tom@...bertland.com>, cgallek@...gle.com,
	Josh Snyder <josh@...e406.com>,
	Aaron Conole <aconole@...heb.org>,
	"David S. Miller" <davem@...emloft.net>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as
 drain mode

On Thu, Mar 24, 2016 at 09:33:11AM -0700, Eric Dumazet wrote:
> > --- a/net/ipv4/inet_hashtables.c
> > +++ b/net/ipv4/inet_hashtables.c
> > @@ -189,6 +189,8 @@ static inline int compute_score(struct sock *sk, struct net *net,
> >                                 return -1;
> >                         score += 4;
> >                 }
> > +               if (sk->sk_reuseport)
> > +                       score++;
> 
> This wont work with BPF
> 
> >                 if (sk->sk_incoming_cpu == raw_smp_processor_id())
> >                         score++;
> 
> This one does not work either with BPF

But this *is* in 4.5. Does this mean that this part doesn't work anymore or
just that it's not usable in conjunction with BPF ? In this case I'm less
worried, because it would mean that we have a solution for non-BPF aware
applications and that BPF-aware applications can simply use BPF.

> Whole point of BPF was to avoid iterate through all sockets [1],
> and let user space use whatever selection logic it needs.
> 
> [1] This was okay with up to 16 sockets. But with 128 it does not scale.

Indeed.

> If you really look at how BPF works, implementing another 'per listener' flag
> would break the BPF selection.

OK.

> You can certainly implement the SO_REUSEPORT_LISTEN_OFF by loading an
> updated BPF, why should we add another way in the kernel to do the same,
> in a way that would not work in some cases ?

I don't try to reimplement something already available, but I'm confused
by a few points :
  - the code above already exists and you mention it cannot be used with BPF
  - for the vast majority of applications not using BPF, would the above *still*
    work (it worked in 4.4-rc at least)
  - it seems to me that for BPF to be usable on process shutting down, we'd
    need to have some form of central knowledge if the goal is to redefine
    how to distribute the load. In my case there are multiple independant
    processes forked on startup, so it's unclear to me how each of them could
    reconfigure BPF when shutting down without risking to break the other ones.
  - the doc makes me believe that BPF would require privileges to be unset, so
    that would not be compatible with a process shutting down which has already
    dropped its privileges after startup, but I could be wrong.

Thanks for your help on this,
Willy