linux-kernel - Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 1 Aug 2011 10:55:40 -0500 (CDT)
From:	Christoph Lameter <cl@...ux.com>
To:	Pekka Enberg <penberg@...nel.org>
cc:	David Rientjes <rientjes@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>, hughd@...gle.com,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1

The future plans that I have for performance improvements are:

1. The percpu partial lists.

The min_partial settings are halved by this approach so that there wont be
any excessive memory usage. Pages on per cpu partial lists are frozen and
this means that the __slab_free path can avoid taking node locks for a
page that is cached by another processor. This causes another significant
performance gain in hackbench of up to 20%. The problem here is to fine
tune the approach and clean up the patchset.

2. per cpu full lists.

These will not be specific to a particular slab cache but shared amoung
all of them. This will reduce the need to keep empty slab pages on the
per node partial lists and therefore also reduce memory consumption.

The per cpu full lists will be essentially a caching layer for the
page allocator and will make slab acquisition and release as fast
as the slub fastpath for alloc and free (it uses the same
this_cpu_cmpxchg_double based approach). I basically gave up on
fixing up the page allocator fastpath after trying various approaches
over the last weeks. Maybe the caching layer can be made available
for other kernel subsystems that need fast page access too.

The scaling issues that are left over are then those caused by

1. The per node lock taken for the partial lists per node.
   This can be controlled by enlarging the per cpu partial lists.

2. The necessity to go to the page allocator.
   This will be tunable by configuring the caching layer.

3. Bouncing cachelines for __remote_free if multiple processors
   enter __slab_free for the same page.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/