linux-kernel - Re: [patch 2/3] slub: scan partial list for free slabs when thrashing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.1.10.0903310920220.24040@qirst.com>
Date:	Tue, 31 Mar 2009 09:23:47 -0400 (EDT)
From:	Christoph Lameter <cl@...ux.com>
To:	Pekka Enberg <penberg@...helsinki.fi>
cc:	David Rientjes <rientjes@...gle.com>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Martin Bligh <mbligh@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [patch 2/3] slub: scan partial list for free slabs when
 thrashing

On Tue, 31 Mar 2009, Pekka Enberg wrote:

> On Sun, 29 Mar 2009, David Rientjes wrote:
> > > Whenever a cpu cache satisfies a fastpath allocation, a fastpath counter
> > > is incrememted.  This counter is cleared whenever the slowpath is
> > > invoked.  This tracks how many fastpath allocations the cpu slab has
> > > fulfilled before it must be refilled.
>
> On Mon, 2009-03-30 at 10:37 -0400, Christoph Lameter wrote:
> > That adds fastpath overhead and it shows for small objects in your tests.
>
> Yup, and looking at this:
>
> +       u16 fastpath_allocs;    /* Consecutive fast allocs before slowpath */
> +       u16 slowpath_allocs;    /* Consecutive slow allocs before watermark */
>
> How much do operations on u16 hurt on, say, x86-64? It's nice that
> sizeof(struct kmem_cache_cpu) is capped at 32 bytes but on CPUs that
> have bigger cache lines, the types could be wider.
>
> Christoph, why is struct kmem_cache_cpu not __cacheline_aligned_in_smp
> btw?

Because it is either allocated using kmalloc and aligned to a cacheline
boundary there or the kmem_cache_cpu entries come from the percpu
definition for kmem_cache_cpu. There we dont need cacheline alignment
since they are tightly packet. If the cacheline size is 64 bit then
neighboring kmem_cache_cpus fit into one cacheline which reduces cache
footprint and increased cache hotness.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/