linux-kernel - Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.11.1410270850160.14245@gentwo.org>
Date:	Mon, 27 Oct 2014 08:53:48 -0500 (CDT)
From:	Christoph Lameter <cl@...ux.com>
To:	Joonsoo Kim <iamjoonsoo.kim@....com>
cc:	akpm@...uxfoundation.org, rostedt@...dmis.org,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	linux-mm@...ck.org, penberg@...nel.org, iamjoonsoo@....com
Subject: Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for
 RT)

On Mon, 27 Oct 2014, Joonsoo Kim wrote:

> > One other aspect of this patchset is that it reduces the cache footprint
> > of the alloc and free functions. This typically results in a performance
> > increase for the allocator. If we can avoid the page_address() and
> > virt_to_head_page() stuff that is required because we drop the ->page
> > field in a sufficient number of places then this may be a benefit that
> > goes beyond the RT and CONFIG_PREEMPT case.
>
> Yeah... if we can avoid those function calls, it would be good.

One trick that may be possible is to have an address mask for the
page_address. If a pointer satisfies the mask requuirements then its on
the right page and we do not need to do virt_to_head_page.

> But, current struct kmem_cache_cpu occupies just 32 bytes on 64 bits
> machine, and, that means just 1 cacheline. Reducing size of struct may have
> no remarkable performance benefit in this case.

Hmmm... If we also drop the partial field then a 64 byte cacheline would
fit kmem_cache_cpu structs from 4 caches. If we place them correctly then
the frequently used caches could avoid fetching up to 3 cachelines.

You are right just dropping ->page wont do anything since the
kmem_cache_cpu struct is aligned to a double word boundary.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/