lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1108020925110.1114@chino.kir.corp.google.com>
Date:	Tue, 2 Aug 2011 09:37:16 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Christoph Lameter <cl@...ux.com>
cc:	Pekka Enberg <penberg@...helsinki.fi>,
	Andi Kleen <andi@...stfloor.org>, tj@...nel.org,
	Metathronius Galabant <m.galabant@...glemail.com>,
	Matt Mackall <mpm@...enic.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Adrian Drzewiecki <z@...e.net>, linux-kernel@...r.kernel.org
Subject: Re: [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3

On Tue, 2 Aug 2011, Christoph Lameter wrote:

> > This applied nicely to Linus' tree so I've moved to testing atop that
> > rather than slub/lockless on the same netperf testing environment as the
> > slab vs. slub comparison.  The benchmarking completed without error and
> > here are the results:
> >
> > 	threads		before		after
> > 	 16		75509		75443  (-0.1%)
> > 	 32		118121		117558 (-0.5%)
> > 	 48		149997		149514 (-0.3%)
> > 	 64		185216		186772 (+0.8%)
> > 	 80		221195		222612 (+0.6%)
> > 	 96		239732		241089 (+0.6%)
> > 	112		261967		266643 (+1.8%)
> > 	128		272946		281794 (+3.2%)
> > 	144		279202		289421 (+3.7%)
> > 	160		285745		297216 (+4.0%)
> >
> > So the patchset certainly looks helpful, especially if it improves other
> > benchmarks as well.
> 
> The problem is that the partial approach has not been fine tuned yet for
> these larger loads. And the proper knobs are not implemented yet.
> 

Aside from per-cpu partial lists, I think this particular benchmark would 
benefit from two other changes on my testing environment:

 - remote cpu freeing so that objects allocated on a different cpu get 
   moved to a separate list that will eventually get flushed back to the 
   origin cpu to be reallocated later with sane heuristics to determine
   when to take the necessary lock and cacheline bounce, and

 - a preference to only pull a slab from the partial lists if there are 
   a sane number of free objects risking perhaps a costly page allocation 
   that will nevertheless allow the fastpaths to be exercised a little 
   more either way (this benchmark suffers horribly when only one or two 
   objects can be allocated from a partial slab).

> > I'll review the patches individually, starting with the cleanup patches
> > that can hopefully be pushed quickly while we discuss per-cpu partial
> > lists further.
> 
> I am currently reworking the patches to operate on a linked list instead
> of a very small array of pointers to page structs. That will allow much
> larger per cpu partial lists and a dynamic configuration of the sizes.
> 

Ok, so is the per-cpu partial list patch in this series worth the review 
or are you going to go under the hood and rework it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ