lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 24 Mar 2016 16:30:53 +0100
From:	Willy Tarreau <w@....eu>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Tolga Ceylan <tolga.ceylan@...il.com>,
	Tom Herbert <tom@...bertland.com>, cgallek@...gle.com,
	Josh Snyder <josh@...e406.com>,
	Aaron Conole <aconole@...heb.org>,
	"David S. Miller" <davem@...emloft.net>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as
 drain mode

Hi Eric,

(just lost my e-mail, trying not to forget some points)

On Thu, Mar 24, 2016 at 07:45:44AM -0700, Eric Dumazet wrote:
> On Thu, 2016-03-24 at 15:22 +0100, Willy Tarreau wrote:
> > Hi Eric,
> 
> > But that means that any software making use of SO_REUSEPORT needs to
> > also implement BPF on Linux to achieve the same as what it does on
> > other OSes ? Also I found a case where a dying process would still
> > cause trouble in the accept queue, maybe it's not redistributed, I
> > don't remember, all I remember is that my traffic stopped after a
> > segfault of only one of them :-/ I'll have to dig a bit regarding
> > this.
> 
> Hi Willy
> 
> Problem is : If we add a SO_REUSEPORT_LISTEN_OFF, this wont work with
> BPF. 

I wasn't for adding SO_REUSEPORT_LISTEN_OFF either. Instead the idea was
just to modify the score in compute_score() so that a socket which disables
SO_REUSEPORT scores less than one which still has it. The application
wishing to terminate just has to clear the SO_REUSEPORT flag and wait for
accept() reporting EAGAIN. The patch simply looked like this (copy-pasted
hence space-mangled) :

--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -189,6 +189,8 @@ static inline int compute_score(struct sock *sk, struct net *net,
                                return -1;
                        score += 4;
                }
+               if (sk->sk_reuseport)
+                       score++;
                if (sk->sk_incoming_cpu == raw_smp_processor_id())
                        score++;
        }

> BPF makes a decision without knowing individual listeners states.

But is the decision taken without considering compute_score() ? The point
really was to be the least possibly intrusive and quite logical for the
application : "disable SO_REUSEPORT when you don't want to participate to
incoming load balancing anymore".

> Or we would need to extend BPF to access these kind of states.
> Doable certainly, but we need to be convinced it is necessary.

You know that I don't like complex designs to address simple issues if
possible :-)

> And yes, if a listener is closed while children are still in accept
> queue, we drop all the children, we do not care of redistributing them
> to another listeners. Really too complex to be worth it.

Forget this, I mixed two issues here. Yes I know that redistributing is
almost impossible, I've read that code a year ago or so and realized how
complex this would be, without providing even 100% success rate. I wasn't
speaking about redistribution of incoming queue but about an issue I've
met where when I have 4 processes bound to the same port, if one dies,
its share of incoming traffic is definitely lost. The only fix was to
restart the processes to create new listeners. But I don't remember the
conditions where I met this case, I don't even remember if it was on an
-rc kernel or a final one, so I'd prefer to discuss this only once I have
more elements.

Cheers,
Willy

View attachment "0001-net-make-lingering-sockets-score-less-in-compute_sco.patch" of type "text/plain" (2649 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ