lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1108011046230.8420@router.home>
Date:	Mon, 1 Aug 2011 10:55:40 -0500 (CDT)
From:	Christoph Lameter <cl@...ux.com>
To:	Pekka Enberg <penberg@...nel.org>
cc:	David Rientjes <rientjes@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>, hughd@...gle.com,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1


The future plans that I have for performance improvements are:

1. The percpu partial lists.

The min_partial settings are halved by this approach so that there wont be
any excessive memory usage. Pages on per cpu partial lists are frozen and
this means that the __slab_free path can avoid taking node locks for a
page that is cached by another processor. This causes another significant
performance gain in hackbench of up to 20%. The problem here is to fine
tune the approach and clean up the patchset.

2. per cpu full lists.

These will not be specific to a particular slab cache but shared amoung
all of them. This will reduce the need to keep empty slab pages on the
per node partial lists and therefore also reduce memory consumption.

The per cpu full lists will be essentially a caching layer for the
page allocator and will make slab acquisition and release as fast
as the slub fastpath for alloc and free (it uses the same
this_cpu_cmpxchg_double based approach). I basically gave up on
fixing up the page allocator fastpath after trying various approaches
over the last weeks. Maybe the caching layer can be made available
for other kernel subsystems that need fast page access too.

The scaling issues that are left over are then those caused by

1. The per node lock taken for the partial lists per node.
   This can be controlled by enlarging the per cpu partial lists.

2. The necessity to go to the page allocator.
   This will be tunable by configuring the caching layer.

3. Bouncing cachelines for __remote_free if multiple processors
   enter __slab_free for the same page.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ