lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 31 Mar 2009 01:23:48 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Pekka Enberg <penberg@...helsinki.fi>
cc:	Christoph Lameter <cl@...ux.com>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Martin Bligh <mbligh@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [patch 2/3] slub: scan partial list for free slabs when
 thrashing

On Tue, 31 Mar 2009, Pekka Enberg wrote:

> On Mon, 2009-03-30 at 10:37 -0400, Christoph Lameter wrote:
> > That adds fastpath overhead and it shows for small objects in your tests.
> 
> Yup, and looking at this:
> 
> +       u16 fastpath_allocs;    /* Consecutive fast allocs before slowpath */
> +       u16 slowpath_allocs;    /* Consecutive slow allocs before watermark */
> 
> How much do operations on u16 hurt on, say, x86-64?

As opposed to unsigned int?  These simply use the word variations of the 
mov, test, cmp, and inc instructions instead of long.  It's the same 
tradeoff when using the u16 slub fields within struct page except it's not 
strictly required in this instance because of size limitations, but rather 
for cacheline optimization.

> It's nice that
> sizeof(struct kmem_cache_cpu) is capped at 32 bytes but on CPUs that
> have bigger cache lines, the types could be wider.
> 

Right, this would not change the unpacked size of the struct whereas using 
unsigned int would.

Since MAX_OBJS_PER_PAGE (which should really be renamed MAX_OBJS_PER_SLAB) 
ensures there is no overflow for u16 types, the only time fastpath_allocs 
would need to be wider is when the object size is sufficiently small and 
there had been frees to the cpu slab so that it overflows.  In this 
circumstance, slowpath_allocs would simply be incremented and it would be 
corrected the next time a cpu slab does allocate beyond the threshold 
(SLAB_THRASHING_THRESHOLD should never be 1).  The chance of reaching the 
threshold on successive fastpath counter overflows grows exponentially.

And since slowpath_allocs will never overflow because it's capped at 
SLAB_THRASHING_THRESHOLD + 1 (the cpu slab will be refilled with a slab 
that will ensure slowpath_allocs will be decremented the next time the 
slowpath is invoked), overflow isn't an immediate problem with either.

> Christoph, why is struct kmem_cache_cpu not __cacheline_aligned_in_smp
> btw?
> 

This was removed in 4c93c355d5d563f300df7e61ef753d7a064411e9.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ