[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1317290032.4188.1223.camel@debian>
Date: Thu, 29 Sep 2011 17:53:52 +0800
From: "Alex,Shi" <alex.shi@...el.com>
To: Christoph Lameter <cl@...two.org>,
Pekka Enberg <penberg@...helsinki.fi>
Cc: "Chen, Tim C" <tim.c.chen@...el.com>,
"Huang, Ying" <ying.huang@...el.com>,
"Huang, Ying" <ying.huang@...el.com>,
Andi Kleen <ak@...ux.intel.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH] slub Discard slab page only when node partials >
minimum setting
On Sat, 2011-09-24 at 04:02 +0800, Christoph Lameter wrote:
> On Fri, 23 Sep 2011, Alex,Shi wrote:
>
> > Just did a little bit work on this. I tested hackbench with difference
> > cpu_partial values. The value set in kmem_cache_open(), tried 1/2, 2
> > times, 8 times, 32 times and 128 times original value. Seems the 8 times
> > value has a slight better performance on almost of my machines,
> > nhm-ex/nhm-ep/wsm-ep.
>
> Is it really worth it? The higher the value the higher the potential
> memory that is stuck in the per cpu partial pages?
It is hard to find best balance. :)
>
> > It needs to do far more test on this tunning. I am going to seek more
> > benchmarks next week. and try tune on different cpu_partial size and
> > code path.
>
> Thanks for all your efforts.
I am tested aim9/netperf, both of them was said related to memory
allocation, but didn't find performance change with/without PCP. Seems
only hackbench sensitive on this. As to aim9, whichever with ourself
configuration, or with Mel Gorman's aim9 configuration from his mmtest,
both of them has no clear performance change for PCP slub.
Checking the kernel function call graphic via perf record/perf report,
slab function only be used much in hackbench benchmark.
I also tried different code path
1, remove the s->cpu_partial limitation, performance drop 30% on
"hackbench 100 process 2000"
2, don't dump cpu partial into node partial, on the contrary, don't fill
cpu partial if it's larger than s->cpu_partial. but no positive
performance change for this, and seems a little bit low on 4 sockets
machines.
3, don't dump cpu partial into node partial, and only fill cpu partial
in allocation when cpu partial is less then s->cpu_partial. insert free
slab into node partial in __slab_free() directly. No clear performance
change for this. BTW, actually, this purpose won't reduce the node
partial lock times.
My experiment patch for new code path 2,3, need to disable VM_BUG_ON
since frozen has a short time incoherence. and may left empty slabs
after slab free. so it just a experiment patch. The attachment is for
code path 2.
Above is what I did this week for PCP.
BTW, I will take my one week holiday from tomorrow. e-mail access will
be slow.
View attachment "patch-pcpnodump" of type "text/x-patch" (3613 bytes)
Powered by blists - more mailing lists