[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160324165047.GA7585@1wt.eu>
Date: Thu, 24 Mar 2016 17:50:47 +0100
From: Willy Tarreau <w@....eu>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Tolga Ceylan <tolga.ceylan@...il.com>,
Tom Herbert <tom@...bertland.com>, cgallek@...gle.com,
Josh Snyder <josh@...e406.com>,
Aaron Conole <aconole@...heb.org>,
"David S. Miller" <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as
drain mode
On Thu, Mar 24, 2016 at 09:33:11AM -0700, Eric Dumazet wrote:
> > --- a/net/ipv4/inet_hashtables.c
> > +++ b/net/ipv4/inet_hashtables.c
> > @@ -189,6 +189,8 @@ static inline int compute_score(struct sock *sk, struct net *net,
> > return -1;
> > score += 4;
> > }
> > + if (sk->sk_reuseport)
> > + score++;
>
> This wont work with BPF
>
> > if (sk->sk_incoming_cpu == raw_smp_processor_id())
> > score++;
>
> This one does not work either with BPF
But this *is* in 4.5. Does this mean that this part doesn't work anymore or
just that it's not usable in conjunction with BPF ? In this case I'm less
worried, because it would mean that we have a solution for non-BPF aware
applications and that BPF-aware applications can simply use BPF.
> Whole point of BPF was to avoid iterate through all sockets [1],
> and let user space use whatever selection logic it needs.
>
> [1] This was okay with up to 16 sockets. But with 128 it does not scale.
Indeed.
> If you really look at how BPF works, implementing another 'per listener' flag
> would break the BPF selection.
OK.
> You can certainly implement the SO_REUSEPORT_LISTEN_OFF by loading an
> updated BPF, why should we add another way in the kernel to do the same,
> in a way that would not work in some cases ?
I don't try to reimplement something already available, but I'm confused
by a few points :
- the code above already exists and you mention it cannot be used with BPF
- for the vast majority of applications not using BPF, would the above *still*
work (it worked in 4.4-rc at least)
- it seems to me that for BPF to be usable on process shutting down, we'd
need to have some form of central knowledge if the goal is to redefine
how to distribute the load. In my case there are multiple independant
processes forked on startup, so it's unclear to me how each of them could
reconfigure BPF when shutting down without risking to break the other ones.
- the doc makes me believe that BPF would require privileges to be unset, so
that would not be compatible with a process shutting down which has already
dropped its privileges after startup, but I could be wrong.
Thanks for your help on this,
Willy
Powered by blists - more mailing lists