netdev - Re: SO_REUSEPORT - can it be done in kernel?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110228163742.GH9763@canuck.infradead.org>
Date:	Mon, 28 Feb 2011 11:37:42 -0500
From:	Thomas Graf <tgraf@...radead.org>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Herbert Xu <herbert@...dor.apana.org.au>,
	David Miller <davem@...emloft.net>, rick.jones2@...com,
	therbert@...gle.com, wsommerfeld@...gle.com,
	daniel.baluta@...il.com, netdev@...r.kernel.org
Subject: Re: SO_REUSEPORT - can it be done in kernel?

On Mon, Feb 28, 2011 at 05:22:54PM +0100, Eric Dumazet wrote:
> Le lundi 28 février 2011 à 09:13 -0500, Thomas Graf a écrit :
> > On Mon, Feb 28, 2011 at 07:36:59PM +0800, Herbert Xu wrote:
> > > But please do test them heavily, especially if you have an AMD
> > > NUMA machine as that's where scalability problems really show
> > > up.  Intel tends to be a lot more forgiving.  My last AMD machine
> > > blew up years ago :)
> > 
> > This is just a preliminary test result and not 100% reliable
> > because half through the testing the machine reported memory
> > issues and disabled a DIMM before booting the tested kernels.
> > 
> > Nevertheless, bind 9.7.3:
> > 
> > 2.6.38-rc5+: 62kqps
> > 2.6.38-rc5+ w/ Herbert's patch: 442kqps
> > 
> > This is on a 2 NUMA Intel Xeon X5560 @ 2.80GHz with 16 cores
> > 
> > Again, this number is not 100% reliably but clearly shows that
> > the concept of the patch is working very well.
> > 
> > Will test Herbert's patch on the machine that did 650kqps with
> > SO_REUSEPORT and also on some AMD machines.
> > --
> 
> I suspect your queryperf input file hits many zones ?

No, we use a simple example.com zone with host[1-4] A records
resolving to 10.[1-4].0.1

> With a single zone, my machine is able to give 250kps : most of the time
> is consumed in bind code, dealing with rwlocks and false sharing
> things...
> 
> (bind-9.7.2-P3)
> Using two remote machines to perform queries, on bnx2x adapter, RSS
> enabled : two cpus receive UDP frames for the same socket, so we also
> hit false sharing in kernel receive path.

How do you measure the qps? The output of queryperf? That is not always
accurate. I run rdnc stats twice and then calculate the qps based on the
counter "queries resulted in successful answer" diff and timestamp diff.

The numbers differ a lot depending on the architecture we test on.

F.e. on a 12 core AMD with 2 NUMA nodes:

2.6.32   named -n 1: 37.0kqps
         named:       3.8kqps (yes, no joke, the socket receive buffer is
                               always full and the kernel drops pkts)

2.6.38-rc5+ with Herbert's patches:
        named -n 1:  36.9kqps
        named:      222.0kqps
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html