lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 07 Jul 2011 22:05:03 +0300
From:	Pekka Enberg <penberg@...helsinki.fi>
To:	Christoph Lameter <cl@...ux.com>
Cc:	David Rientjes <rientjes@...gle.com>,
	Andi Kleen <andi@...stfloor.org>, tj@...nel.org,
	Metathronius Galabant <m.galabant@...glemail.com>,
	Matt Mackall <mpm@...enic.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Adrian Drzewiecki <z@...e.net>, linux-kernel@...r.kernel.org
Subject: Re: [slub p2 0/4] SLUB: [RFC] Per cpu partial lists V2

On Mon, 2011-06-20 at 10:32 -0500, Christoph Lameter wrote:
> The following patchset applied on top of the lockless patchset V7. It
> introduces per cpu partial lists which allow a performance increase of
> around ~15 during contention for the nodelock (can be tested using
> hackbench).
> 
> These lists help to avoid per nodelocking overhead. Allocator latency
> could be further reduced by making these operations work without
> disabling interrupts (like the fastpath and the free slowpath) as well as
> implementing better ways of handling ther cpu array with partial pages.
> 
> I am still not satisfied with the cleanliness of the code after these
> changes. Some review with suggestions as to how to restructure the
> code given these changes in operations would be appreciated.
> 
> It is interesting to note that BSD has gone to a scheme with partial
> pages only per cpu (source: Adrian). Transfer of cpu ownerships is
> done using IPIs. Probably too much overhead for our taste. The use
> of a few per cpu partial pages looks to be beneficial though.
> 
> Note that there is no performance gain when there is no contention.
> 
> Performance:
> 
> 				Before		After
> ./hackbench 100 process 200000
> 				Time: 2299.072	1742.454
> ./hackbench 100 process 20000
> 				Time: 224.654	182.393
> ./hackbench 100 process 20000
> 				Time: 227.126	182.780
> ./hackbench 100 process 20000
> 				Time: 219.608	182.899
> ./hackbench 10 process 20000
> 				Time: 21.769	18.756
> ./hackbench 10 process 20000
> 				Time: 21.657	18.938
> ./hackbench 10 process 20000
> 				Time: 23.193	19.537
> ./hackbench 1 process 20000
> 				Time: 2.337	2.263
> ./hackbench 1 process 20000
> 				Time: 2.223	2.271
> ./hackbench 1 process 20000
> 				Time: 2.269	2.301

Impressive numbers! David, comments on the series?

			Pekka

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ