lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 16 Jul 2007 11:49:36 -0500
From:	Matt Mackall <mpm@...enic.com>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	linux-kernel <linux-kernel@...r.kernel.org>,
	akpm@...ux-foundation.org, Pekka Enberg <penberg@...helsinki.fi>,
	Christoph Lameter <clameter@....com>
Subject: Re: [PATCH] slob: reduce list scanning

On Mon, Jul 16, 2007 at 04:01:15PM +1000, Nick Piggin wrote:
> Matt Mackall wrote:
> >The version of SLOB in -mm always scans its free list from the
> >beginning, which results in small allocations and free segments
> >clustering at the beginning of the list over time. This causes the
> >average search to scan over a large stretch at the beginning on each
> >allocation.
> >
> >By starting each page search where the last one left off, we evenly
> >distribute the allocations and greatly shorten the average search.
> >
> >Without this patch, kernel compiles on a 1.5G machine take a large
> >amount of system time for list scanning. With this patch, compiles are
> >within a few seconds of performance of a SLAB kernel with no notable
> >change in system time.
> 
> This looks pretty nice, and performance results sound good too.
> IMO this should probably be merged along with the previous
> SLOB patches, because they removed the cyclic scanning to begin
> with (so it may be possible that introduces a performnace
> regression in some situations).
> 
> I wonder what it would take to close the performance gap further.
> I still want to look at per-cpu freelists after Andrew merges
> this set of patches. That may improve both cache hotness and
> CPU scalability.

The idea I'm currently kicking around is having an array of spinlocks
and list heads per CPU and add an array index to the SLOB page struct.

To allocate, we loop over the array starting at the current CPU
looking for space. On failure, we add a page to the current CPU's
list. We can imagine several variants here: attempting to trylock
while scanning the list or doing no fallback at all. The first is
liable to be unhelpful if there's actually contention, the second will
consume more total memory but reduce the average scan time.

To free, we locate the list from the page struct so we can grab the
relevant lock.

This probably also ends up being very friendly to NUMA. But it's not
clear that it's worth doing for the common case of 2 cores, where
contention may be too low to be worth the extra trouble.

> Actually SLOB potentially has some fundamental CPU cache hotness
> advantages over the other allocators, for the same reasons as
> its space advantages. It may be possible to make some workloads
> faster with SLOB than with SLUB! Maybe we could remove SLAB and
> SLUB then :)

It's all handwaving until there are actually benchmarks.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ