linux-kernel - Re: [PATCH v4 1/9] slab: add opt-in caching layer of percpu sheaves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1724b59a-0c3a-482c-b141-b5611665d1f4@suse.cz>
Date: Tue, 13 May 2025 18:08:30 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Harry Yoo <harry.yoo@...cle.com>
Cc: Suren Baghdasaryan <surenb@...gle.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Christoph Lameter
 <cl@...ux.com>, David Rientjes <rientjes@...gle.com>,
 Roman Gushchin <roman.gushchin@...ux.dev>,
 Uladzislau Rezki <urezki@...il.com>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, rcu@...r.kernel.org,
 maple-tree@...ts.infradead.org
Subject: Re: [PATCH v4 1/9] slab: add opt-in caching layer of percpu sheaves

On 4/29/25 03:08, Harry Yoo wrote:
> On Fri, Apr 25, 2025 at 10:27:21AM +0200, Vlastimil Babka wrote:
>> Specifying a non-zero value for a new struct kmem_cache_args field
>> sheaf_capacity will setup a caching layer of percpu arrays called
>> sheaves of given capacity for the created cache.
>> 
>> Allocations from the cache will allocate via the percpu sheaves (main or
>> spare) as long as they have no NUMA node preference. Frees will also
>> put the object back into one of the sheaves.
>> 
>> When both percpu sheaves are found empty during an allocation, an empty
>> sheaf may be replaced with a full one from the per-node barn. If none
>> are available and the allocation is allowed to block, an empty sheaf is
>> refilled from slab(s) by an internal bulk alloc operation. When both
>> percpu sheaves are full during freeing, the barn can replace a full one
>> with an empty one, unless over a full sheaves limit. In that case a
>> sheaf is flushed to slab(s) by an internal bulk free operation. Flushing
>> sheaves and barns is also wired to the existing cpu flushing and cache
>> shrinking operations.
>> 
>> The sheaves do not distinguish NUMA locality of the cached objects. If
>> an allocation is requested with kmem_cache_alloc_node() (or a mempolicy
>> with strict_numa mode enabled) with a specific node (not NUMA_NO_NODE),
>> the sheaves are bypassed.
>> 
>> The bulk operations exposed to slab users also try to utilize the
>> sheaves as long as the necessary (full or empty) sheaves are available
>> on the cpu or in the barn. Once depleted, they will fallback to bulk
>> alloc/free to slabs directly to avoid double copying.
>> 
>> The sheaf_capacity value is exported in sysfs for observability.
>> 
>> Sysfs CONFIG_SLUB_STATS counters alloc_cpu_sheaf and free_cpu_sheaf
>> count objects allocated or freed using the sheaves (and thus not
>> counting towards the other alloc/free path counters). Counters
>> sheaf_refill and sheaf_flush count objects filled or flushed from or to
>> slab pages, and can be used to assess how effective the caching is. The
>> refill and flush operations will also count towards the usual
>> alloc_fastpath/slowpath, free_fastpath/slowpath and other counters for
>> the backing slabs.  For barn operations, barn_get and barn_put count how
>> many full sheaves were get from or put to the barn, the _fail variants
>> count how many such requests could not be satisfied mainly  because the
>> barn was either empty or full.
> 
>> While the barn also holds empty sheaves
>> to make some operations easier, these are not as critical to mandate own
>> counters.  Finally, there are sheaf_alloc/sheaf_free counters.
> 
> I initially thought we need counters for empty sheaves to see how many times
> it grabs empty sheaves from the barn, but looks like barn_put
> ("put full sheaves to the barn") is effectively a proxy for that, right?

Mostly yes, the free sheaves in barn is mainly to make the "replace full
with empty" easy, but if that fails because there's no empty sheaves, the
fallback with allocating an empty sheaf should still be successful enough
that tracking it in detail doesn't seem that useful.

>> Access to the percpu sheaves is protected by local_trylock() when
>> potential callers include irq context, and local_lock() otherwise (such
>> as when we already know the gfp flags allow blocking). The trylock
>> failures should be rare and we can easily fallback. Each per-NUMA-node
>> barn has a spin_lock.
>> 
>> When slub_debug is enabled for a cache with sheaf_capacity also
>> specified, the latter is ignored so that allocations and frees reach the
>> slow path where debugging hooks are processed.
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
>> ---
> 
> Reviewed-by: Harry Yoo <harry.yoo@...cle.com>

Thanks!

> LGTM, with a few nits:

I've applied them, thanks. Responding only to one that needs it:

>> +static __fastpath_inline
>> +bool free_to_pcs(struct kmem_cache *s, void *object)
>> +{
>> +	struct slub_percpu_sheaves *pcs;
>> +
>> +restart:
>> +	if (!local_trylock(&s->cpu_sheaves->lock))
>> +		return false;
>> +
>> +	pcs = this_cpu_ptr(s->cpu_sheaves);
>> +
>> +	if (unlikely(pcs->main->size == s->sheaf_capacity)) {
>> +
>> +		struct slab_sheaf *empty;
>> +
>> +		if (!pcs->spare) {
>> +			empty = barn_get_empty_sheaf(pcs->barn);
>> +			if (empty) {
>> +				pcs->spare = pcs->main;
>> +				pcs->main = empty;
>> +				goto do_free;
>> +			}
>> +			goto alloc_empty;
>> +		}
>> +
>> +		if (pcs->spare->size < s->sheaf_capacity) {
>> +			swap(pcs->main, pcs->spare);
>> +			goto do_free;
>> +		}
>> +
>> +		empty = barn_replace_full_sheaf(pcs->barn, pcs->main);
>> +
>> +		if (!IS_ERR(empty)) {
>> +			stat(s, BARN_PUT);
>> +			pcs->main = empty;
>> +			goto do_free;
>> +		}
> 
> nit: stat(s, BARN_PUT_FAIL); should probably be here instead?

Hm, the intention was that no, because when PTR_ERR(empty) == -ENOMEM, we
try alloc_empty_sheaf(), and that will likely succeed, and then
__pcs_install_empty_sheaf() will just force the put full sheaf (and record a
BARN_PUT), because we already saw that we're not over capacity. But now I
see I didn't describe it as a scenario for the function's comment, so I will
add that.

But technically we should also record stat(s, BARN_PUT_FAIL) when that
alloc_empty_sheaf() fails, but not when we "goto alloc_empty" from the "no
spare" above. Bit icky but I'll add that too.

>> +
>> +		if (PTR_ERR(empty) == -E2BIG) {
>> +			/* Since we got here, spare exists and is full */
>> +			struct slab_sheaf *to_flush = pcs->spare;
>> +
>> +			stat(s, BARN_PUT_FAIL);
>> +
>> +			pcs->spare = NULL;
>> +			local_unlock(&s->cpu_sheaves->lock);
>> +
>> +			sheaf_flush_unused(s, to_flush);
>> +			empty = to_flush;
>> +			goto got_empty;
>> +		}
>