netdev - Re: SO_REUSEPORT - can it be done in kernel?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 27 Feb 2011 06:02:05 -0500
From:	Thomas Graf <tgraf@...radead.org>
To:	Herbert Xu <herbert@...dor.apana.org.au>
Cc:	David Miller <davem@...emloft.net>, rick.jones2@...com,
	therbert@...gle.com, wsommerfeld@...gle.com,
	daniel.baluta@...il.com, netdev@...r.kernel.org
Subject: Re: SO_REUSEPORT - can it be done in kernel?

On Sat, Feb 26, 2011 at 08:57:18AM +0800, Herbert Xu wrote:
> I'm fairly certain the bottleneck is indeed in the kernel, and
> in the UDP stack in particular.
> 
> This is born out by a test where I used two named worker threads,
> both working on the same socket.  Stracing shows that they're
> working flat out only doing sendmsg/recvmsg.
> 
> The result was that they obtained (in aggregate) half the throughput
> of a single worker thread.

I agree. This is the bottleneck that I described were the kernel is
not able to deliver enough queries for BIND to show the lock
contention issues.

But there is also the situation where netperf RR performance numbers
indicate a mugh higher kernel capability but BIND is not able to
deliver more even though the CPU utilization is very low. This is
the situation where we see the large number of futex calls indicating
the lock contention due to too many queries on a single socket.

> Which is why I'm quite skeptical about this REUSEPORT patch as
> IMHO the only reason it produces a great result is solely because
> it is allowing parallel sends going out.
> 
> Rather than modifying all UDP applications out there to fix what
> is fundamentally a kernel problem, I think what we should do is
> fix the UDP stack so that it actually scales.

I am not suggesting that this is the ultimate and final fix for this
problem. It is fixing a symptom rather than fixing the cause but
sometimes being able to fix the symptom becomes really handy :-)

Adding SO_REUSEPORT does not prevent us from fixing the UDP stack
in the long run.

> It isn't all that hard since the easy way would be to only take
> the lock if we're already corked or about to cork.
> 
> For the receive side we also don't need REUSEPORT as we can simply
> make our UDP stack multiqueue.

OK, it is not required and there is definitely a better way to fix
the kernel bottleneck in the long term. Even better.

I still suggest to merge this patch as a immediate workaround fix
until we scale properly on a single socket and also as a workaround
for applications which can't get rid of their per socket mutex quickly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html