[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110228163742.GH9763@canuck.infradead.org>
Date: Mon, 28 Feb 2011 11:37:42 -0500
From: Thomas Graf <tgraf@...radead.org>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Herbert Xu <herbert@...dor.apana.org.au>,
David Miller <davem@...emloft.net>, rick.jones2@...com,
therbert@...gle.com, wsommerfeld@...gle.com,
daniel.baluta@...il.com, netdev@...r.kernel.org
Subject: Re: SO_REUSEPORT - can it be done in kernel?
On Mon, Feb 28, 2011 at 05:22:54PM +0100, Eric Dumazet wrote:
> Le lundi 28 février 2011 à 09:13 -0500, Thomas Graf a écrit :
> > On Mon, Feb 28, 2011 at 07:36:59PM +0800, Herbert Xu wrote:
> > > But please do test them heavily, especially if you have an AMD
> > > NUMA machine as that's where scalability problems really show
> > > up. Intel tends to be a lot more forgiving. My last AMD machine
> > > blew up years ago :)
> >
> > This is just a preliminary test result and not 100% reliable
> > because half through the testing the machine reported memory
> > issues and disabled a DIMM before booting the tested kernels.
> >
> > Nevertheless, bind 9.7.3:
> >
> > 2.6.38-rc5+: 62kqps
> > 2.6.38-rc5+ w/ Herbert's patch: 442kqps
> >
> > This is on a 2 NUMA Intel Xeon X5560 @ 2.80GHz with 16 cores
> >
> > Again, this number is not 100% reliably but clearly shows that
> > the concept of the patch is working very well.
> >
> > Will test Herbert's patch on the machine that did 650kqps with
> > SO_REUSEPORT and also on some AMD machines.
> > --
>
> I suspect your queryperf input file hits many zones ?
No, we use a simple example.com zone with host[1-4] A records
resolving to 10.[1-4].0.1
> With a single zone, my machine is able to give 250kps : most of the time
> is consumed in bind code, dealing with rwlocks and false sharing
> things...
>
> (bind-9.7.2-P3)
> Using two remote machines to perform queries, on bnx2x adapter, RSS
> enabled : two cpus receive UDP frames for the same socket, so we also
> hit false sharing in kernel receive path.
How do you measure the qps? The output of queryperf? That is not always
accurate. I run rdnc stats twice and then calculate the qps based on the
counter "queries resulted in successful answer" diff and timestamp diff.
The numbers differ a lot depending on the architecture we test on.
F.e. on a 12 core AMD with 2 NUMA nodes:
2.6.32 named -n 1: 37.0kqps
named: 3.8kqps (yes, no joke, the socket receive buffer is
always full and the kernel drops pkts)
2.6.38-rc5+ with Herbert's patches:
named -n 1: 36.9kqps
named: 222.0kqps
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists