lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1458888581.6473.30.camel@edumazet-glaptop3.roam.corp.google.com>
Date:	Thu, 24 Mar 2016 23:49:41 -0700
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Willy Tarreau <w@....eu>
Cc:	Tom Herbert <tom@...bertland.com>,
	Yann Ylavic <ylavic.dev@...il.com>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>,
	Tolga Ceylan <tolga.ceylan@...il.com>,
	Craig Gallek <cgallek@...gle.com>,
	Josh Snyder <josh@...e406.com>,
	Aaron Conole <aconole@...heb.org>,
	"David S. Miller" <davem@...emloft.net>,
	Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as
 drain mode

On Fri, 2016-03-25 at 06:28 +0100, Willy Tarreau wrote:
> On Thu, Mar 24, 2016 at 04:54:03PM -0700, Tom Herbert wrote:
> > On Thu, Mar 24, 2016 at 4:40 PM, Yann Ylavic <ylavic.dev@...il.com> wrote:
> > > I'll learn how to do this to get the best performances from the
> > > server, but having to do so to work around what looks like a defect
> > > (for simple/default SMP configurations at least, no NUMA or clever
> > > CPU-affinity or queuing policy involved) seems odd in the first place.
> > >
> > I disagree with your assessment that there is a defect. SO_REUSEPORT
> > is designed to spread packets amongst _equivalent_ connections. In the
> > server draining case sockets are no longer equivalent, but that is a
> > special case.
> 
> I partially disagree with you here Tom. Initially SO_REUSEPORT was not
> used to spread packets but to allow soft restart in some applications.
> I've been using it since 2001 in haproxy on *BSD and linux 2.2. It was
> removed during 2.3 and I used to keep a patch to reimplement it in 2.4
> (basically 2 or 3 lines, the infrastructure was still present), but the
> patch was not accepted. The same patch worked for 2.6 and 3.x, allowing
> me to continue to perform soft-restarts on Linux just like I used to do
> on *BSD. When SO_REUSEPORT was reimplemented in 3.9 with load balancing,
> I was happy because it at last allowed me to drop my patch and I got
> the extra benefit of better load balancing of incoming connections.
> 
> But the main use we have for it (at least historically) is for soft
> restarts, where one process replaces another one. Very few people use
> more than one process in our case.
> 
> However given the benefits of the load spreading for extreme loads,
> I'm willing to find how to achieve the same with BPF, but it's pretty
> clear that at this point I have no idea where to start from and that
> for a single process replacing a single one, it looks quite complicated.
> 
> For me quite frankly the special case is the load balancing which is
> a side benefit (and a nice one, don't get me wrong).
> 
> That's why I would have found it nice to "fix" the process replacement
> to avoid dropping incoming connections, though I don't want it to become
> a problem for future improvements on BPF. I don't think the two lines I
> proposed could become an issue but I'll live without them (or continue
> to apply this patch).
> 
> BTW, I have no problem with having to write a little bit of assembly for
> fast interfaces if it remains untouched for years, we do already have a
> bit in haproxy. It's just a longterm investment.

Everything is possible, but do not complain because BPF went in the
kernel before your changes.

Just rework your patch.

Supporting multiple SO_REUSEPORT groups on the same port should not be
too hard really. Making sure BPF still work for them is feasible.

But the semantic of the socket option would be really different.

You need to not control an individual listener, but a group of listener.

Your dying haproxy would issue a single system call to tell the kernel :
My SO_REUSEPORT group should not accept new SYN packets, so that the new
haproxy can setup a working new SO_REUSEPORT group.




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ