[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0903301329290.21074@chino.kir.corp.google.com>
Date: Mon, 30 Mar 2009 13:38:24 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Mel Gorman <mel@....ul.ie>
cc: Pekka Enberg <penberg@...helsinki.fi>,
Christoph Lameter <cl@...ux-foundation.org>,
Nick Piggin <nickpiggin@...oo.com.au>,
Martin Bligh <mbligh@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [patch 1/3] slub: add per-cache slab thrash ratio
On Mon, 30 Mar 2009, Mel Gorman wrote:
> netperf and tbench will both pound the sl*b allocator far more than sysbench
> will in my opinion although I don't have figures on-hand to back that up. In
> the case of netperf, it might be particular obvious if the client is on one
> CPU and the server on another because I believe that means all allocs happen
> on one CPU and all frees on another.
>
My results are for two 16-core 64G machines on the same rack, one running
netserver and the other running netperf.
> I have a vague concern that such a tunable needs to exist at all though
> and wonder what workloads it can hurt when set to 20 for example versus any
> other value.
>
The tunable needs to exist unless a counter proposal is made that fixes
this slub performance degradation compared to using slab. I'd be very
interested to hear other proposals on how to detect and remedy such
situations in the allocator without the addition of a tunable.
As I mentioned previously in response to Pekka, it won't cause a further
regression if sane SLAB_THRASHING_THRESHOLD and slab_thrash_ratio values
are chosen. The rules are pretty simple as described by the
implementation: if a cpu slab can only allocate 20% of its objects three
times in a row, we're going to choose a more free slab for the partial
list while holding list_lock as opposed to constantly contending on it.
This is particularly important for the netperf benchmark because the only
cpu slabs that thrash are the ones with NUMA locality to the cpu taking
the networking interrupt (because remote_node_defrag_ratio was unchanged
from its default, meaning we avoid remote node defragmentation 98% of the
time).
I haven't measured the fastpath implications of non-thrashing caches (the
increment in the alloc fastpath and the conditional in the alloc slowpath
for partial list sorting) yet, but your suggested experiments should show
that quite well.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists