linux-kernel - Re: [patch 2/2] slub: add min

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 23 Feb 2009 01:58:04 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Pekka Enberg <penberg@...helsinki.fi>
cc:	Christoph Lameter <cl@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [patch 2/2] slub: add min_partial sysfs tunable

On Mon, 23 Feb 2009, Pekka Enberg wrote:

> The patches look good but the description is bit lacking. Does this
> actually fix up something? Why don't we fix the limit calculations
> instead?
> 
> I'm a sucker for numbers so I'm easily fooled into merging patches with
> statements of the form "this shaves off N bytes/kb/mb on XYZ systems".
> 

The memory savings from simply moving min_partial from struct 
kmem_cache_node to struct kmem_cache is obviously not significant (unless 
maybe you're from SGI or something), at the largest it's

	# allocated caches * (MAX_NUMNODES - 1) * sizeof(unsigned long)

The true savings occurs when userspace reduces the number of partial slabs 
that would otherwise be wasted, especially on machines with a large 
number of nodes (ia64 with CONFIG_NODES_SHIFT at 10 for default?).  As 
well as the kernel estimates ideal values for n->min_partial and ensures 
it's within a sane range, userspace has no other input other than writing 
to /sys/kernel/slab/cache/shrink.

There simply isn't any better heuristic to add when calculating the 
partial values for a better estimate that works for all possible caches.  
And since it's currently a static value, the user really has no way of 
reclaiming that wasted space, which can be significant when constrained by 
a cgroup (either cpusets or, later, memory controller slab limits) without 
shrinking it entirely.

This also allows the user to specify that increased fragmentation and more 
partial slabs are actually desired to avoid the cost of allocating new 
slabs at runtime for specific caches.

There's also no reason why this should be a per-struct kmem_cache_node 
value in the first place.  You could argue that a machine would have such 
node size asymmetries that it should be specified on a per-node basis, but 
we know nobody is doing that right now since it's a purely static value at 
the moment and there's no convenient way to tune that via slub's sysfs 
interface.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/