lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1317290032.4188.1223.camel@debian>
Date:	Thu, 29 Sep 2011 17:53:52 +0800
From:	"Alex,Shi" <alex.shi@...el.com>
To:	Christoph Lameter <cl@...two.org>,
	Pekka Enberg <penberg@...helsinki.fi>
Cc:	"Chen, Tim C" <tim.c.chen@...el.com>,
	"Huang, Ying" <ying.huang@...el.com>,
	"Huang, Ying" <ying.huang@...el.com>,
	Andi Kleen <ak@...ux.intel.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH] slub Discard slab page only when node partials >
 minimum setting

On Sat, 2011-09-24 at 04:02 +0800, Christoph Lameter wrote:
> On Fri, 23 Sep 2011, Alex,Shi wrote:
> 
> > Just did a little bit work on this. I tested hackbench with difference
> > cpu_partial values. The value set in kmem_cache_open(), tried 1/2, 2
> > times, 8 times, 32 times and 128 times original value. Seems the 8 times
> > value has a slight better performance on almost of my machines,
> > nhm-ex/nhm-ep/wsm-ep.
> 
> Is it really worth it? The higher the value the higher the potential
> memory that is stuck in the per cpu partial pages?

It is hard to find best balance. :) 
> 
> > It needs to do far more test on this tunning. I am going to seek more
> > benchmarks next week. and try tune on different cpu_partial size and
> > code path.
> 
> Thanks for all your efforts.

I am tested aim9/netperf, both of them was said related to memory
allocation, but didn't find performance change with/without PCP. Seems
only hackbench sensitive on this. As to aim9, whichever with ourself
configuration, or with Mel Gorman's aim9 configuration from his mmtest,
both of them has no clear performance change for PCP slub. 

Checking the kernel function call graphic via perf record/perf report,
slab function only be used much in hackbench benchmark. 

I also tried different code path
1, remove the s->cpu_partial limitation, performance drop 30% on
"hackbench 100 process 2000"

2, don't dump cpu partial into node partial, on the contrary, don't fill
cpu partial if it's larger than s->cpu_partial. but no positive
performance change for this, and seems a little bit low on 4 sockets
machines. 

3, don't dump cpu partial into node partial, and only fill cpu partial
in allocation when cpu partial is less then s->cpu_partial. insert free
slab into node partial in __slab_free() directly. No clear performance
change for this.  BTW, actually, this purpose won't reduce the node
partial lock times. 

My experiment patch for new code path 2,3, need to disable VM_BUG_ON
since frozen has a short time incoherence. and may left empty slabs
after slab free. so it just a experiment patch. The attachment is for
code path 2. 

Above is what I did this week for PCP. 

BTW, I will take my one week holiday from tomorrow. e-mail access will
be slow. 


View attachment "patch-pcpnodump" of type "text/x-patch" (3613 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ