linux-kernel - Re: [RFC PATCH 01/15] static kmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19e0c58f-114c-4bbd-9bc0-25382d7d5cbb@suse.cz>
Date: Thu, 15 Jan 2026 17:59:12 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Harry Yoo <harry.yoo@...cle.com>, Al Viro <viro@...iv.linux.org.uk>,
 Mateusz Guzik <mjguzik@...il.com>
Cc: linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
 linux-kernel@...r.kernel.org, "Christoph Lameter (Ampere)" <cl@...two.org>
Subject: Re: [RFC PATCH 01/15] static kmem_cache instances for core caches

On 1/14/26 08:30, Harry Yoo wrote:
> On Sat, Jan 10, 2026 at 04:02:03AM +0000, Al Viro wrote:
>>         kmem_cache_create() and friends create new instances of
>> struct kmem_cache and return pointers to those.  Quite a few things in
>> core kernel are allocated from such caches; each allocation involves
>> dereferencing an assign-once pointer and for sufficiently hot ones that
>> dereferencing does show in profiles.
>> 
>>         There had been patches floating around switching some of those
>> to runtime_const infrastructure.  Unfortunately, it's arch-specific
>> and most of the architectures lack it.
>> 
>>         There's an alternative approach applicable at least to the caches
>> that are never destroyed, which covers a lot of them.  No matter what,
>> runtime_const for pointers is not going to be faster than plain &,
>> so if we had struct kmem_cache instances with static storage duration, we
>> would be at least no worse off than we are with runtime_const variants.
>> 
>>         There are obstacles to doing that, but they turn out to be easy
>> to deal with.
>> 
>> 1) as it is, struct kmem_cache is opaque for anything outside of a few
>> files in mm/*; that avoids serious headache with header dependencies,
>> etc., and it's not something we want to lose.  Solution: struct
>> kmem_cache_opaque, with the size and alignment identical to struct
>> kmem_cache.  Calculation of size and alignment can be done via the same
>> mechanism we use for asm-offsets.h and rq-offsets.h, with build-time
>> check for mismatches.  With that done, we get an opaque type defined in
>> linux/slab-static.h that can be used for declaring those caches.
>> In linux/slab.h we add a forward declaration of kmem_cache_opaque +
>> helper (to_kmem_cache()) converting a pointer to kmem_cache_opaque
>> into pointer to kmem_cache.
>> 
>> 2) real constructor of kmem_cache needs to be taught to deal with
>> preallocated instances.  That turns out to be easy - we already pass an
>> obscene amount of optional arguments via struct kmem_cache_args, so we
>> can stash the pointer to preallocated instance in there.  Changes in
>> mm/slab_common.c are very minor - we should treat preallocated caches
>> as unmergable, use the instance passed to us instead of allocating a
>> new one and we should not free them.  That's it.
> 
> SLAB_NO_MERGE prevents both side of merging - when 1) creating the cache,
> and when 2) another cache tries to create an alias from it.
> 
> Avoiding 1) makes sense, but is there a reason to prevent 2)?
> 
> If it's fine for other caches to merge into a cache with static
> duration, then it's sufficient to update find_mergeable() to not attempt
> creating an alias during cache creation if args->preallocated is
> specified (instead of using SLAB_NO_MERGE).

The merging prevention is my biggest concern with the approach. We could
potentially solve it by moving the sharing to a different layer than today's
sharing of kmem_cache objects with refcount, and instead have separate
instances that point to the same underlying storage (mainly the per-node and
per-cpu slabs/sheaves). It's possible it would also simplify the suboptimal
sysfs handling of today as the aliases could know their cache name and own
their symlinks.

However slabs and sheaves do have a parent kmem_cache pointer. It's how e.g.
kfree() works by virt_to_slab(obj) -> kmem_cache and then being like
kmem_cache_free().

So we could have kmem_cache->primary_cache field where the primary would
just point to self and aliasing caches to the primary, and newly created
slabs and sheaves would read that ->primary_cache to assign their kmem_cache
pointer. This is not a fasthpath operation so it shouldn't matter, and with
that there wouldn't be any mix of differing cache pointers so the aliases
could be destroyed easily. And then the primary cache wouldn't be able go
away as long as there are aliases, as it is today.

Only a dynamic cache or a non-module static cache thus could become a
primary, for module unload reasons.

For this to work fully mergeable in all scenarios of the order of creating
static vs dynamic aliases, there would however have to be a weird quirk for
static module caches - when such a cache is created, and there's no
compatible primary to become alias of, a dynamic, otherwise unused primary
would need to be created just to become the owner of the slabs and sheaves.
Because if a mergeable dynamic cache appears later, it would not be able to
become a primary for the static module cache to become alias of, because the
static module cache would already have existing slabs and sheaves pointing
to it.

And there might be other issues with this scheme I don't immediately see.
But maybe it's feasible.