linux-kernel - Re: [PATCH RFC v2 01/10] slab: add opt-in caching layer of percpu sheaves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z7woDjICqD0fkghA@harry>
Date: Mon, 24 Feb 2025 17:04:30 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Suren Baghdasaryan <surenb@...gle.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Christoph Lameter <cl@...ux.com>, David Rientjes <rientjes@...gle.com>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Hyeonggon Yoo <42.hyeyoo@...il.com>,
        Uladzislau Rezki <urezki@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, rcu@...r.kernel.org,
        maple-tree@...ts.infradead.org
Subject: Re: [PATCH RFC v2 01/10] slab: add opt-in caching layer of percpu
 sheaves

On Fri, Feb 14, 2025 at 05:27:37PM +0100, Vlastimil Babka wrote:
> Specifying a non-zero value for a new struct kmem_cache_args field
> sheaf_capacity will setup a caching layer of percpu arrays called
> sheaves of given capacity for the created cache.
> 
> Allocations from the cache will allocate via the percpu sheaves (main or
> spare) as long as they have no NUMA node preference. Frees will also
> refill one of the sheaves.
> 
> When both percpu sheaves are found empty during an allocation, an empty
> sheaf may be replaced with a full one from the per-node barn. If none
> are available and the allocation is allowed to block, an empty sheaf is
> refilled from slab(s) by an internal bulk alloc operation. When both
> percpu sheaves are full during freeing, the barn can replace a full one
> with an empty one, unless over a full sheaves limit. In that case a
> sheaf is flushed to slab(s) by an internal bulk free operation. Flushing
> sheaves and barns is also wired to the existing cpu flushing and cache
> shrinking operations.
> 
> The sheaves do not distinguish NUMA locality of the cached objects. If
> an allocation is requested with kmem_cache_alloc_node() with a specific
> node (not NUMA_NO_NODE), sheaves are bypassed.
> 
> The bulk operations exposed to slab users also try to utilize the
> sheaves as long as the necessary (full or empty) sheaves are available
> on the cpu or in the barn. Once depleted, they will fallback to bulk
> alloc/free to slabs directly to avoid double copying.
> 
> Sysfs stat counters alloc_cpu_sheaf and free_cpu_sheaf count objects
> allocated or freed using the sheaves. Counters sheaf_refill,
> sheaf_flush_main and sheaf_flush_other count objects filled or flushed
> from or to slab pages, and can be used to assess how effective the
> caching is. The refill and flush operations will also count towards the
> usual alloc_fastpath/slowpath, free_fastpath/slowpath and other
> counters.
> 
> Access to the percpu sheaves is protected by local_lock_irqsave()
> operations, each per-NUMA-node barn has a spin_lock.
> 
> A current limitation is that when slub_debug is enabled for a cache with
> percpu sheaves, the objects in the array are considered as allocated from
> the slub_debug perspective, and the alloc/free debugging hooks occur
> when moving the objects between the array and slab pages. This means
> that e.g. an use-after-free that occurs for an object cached in the
> array is undetected. Collected alloc/free stacktraces might also be less
> useful. This limitation could be changed in the future.
> 
> On the other hand, KASAN, kmemcg and other hooks are executed on actual
> allocations and frees by kmem_cache users even if those use the array,
> so their debugging or accounting accuracy should be unaffected.
> 
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
> ---
>  include/linux/slab.h |  34 ++
>  mm/slab.h            |   2 +
>  mm/slab_common.c     |   5 +-
>  mm/slub.c            | 982 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  4 files changed, 973 insertions(+), 50 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index e8273f28656936c05d015c53923f8fe69cd161b2..c06734912972b799f537359f7fe6a750918ffe9e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
>  
>  /********************************************************************
>   * 			Core slab cache functions
> +static void __pcs_flush_all_cpu(struct kmem_cache *s, unsigned int cpu)
> +{
> +	struct slub_percpu_sheaves *pcs;
> +
> +	pcs = per_cpu_ptr(s->cpu_sheaves, cpu);
> +
> +	if (pcs->spare) {
> +		sheaf_flush(s, pcs->spare);
> +		free_empty_sheaf(s, pcs->spare);
> +		pcs->spare = NULL;
> +	}
> +
> +	// TODO: handle rcu_free
> +	BUG_ON(pcs->rcu_free);
> +
> +	sheaf_flush_main(s);
> +}

+1 on what Suren mentioned.

> +static void barn_shrink(struct kmem_cache *s, struct node_barn *barn)
> +{
> +	struct list_head empty_list;
> +	struct list_head full_list;
> +	struct slab_sheaf *sheaf, *sheaf2;
> +	unsigned long flags;
> +
> +	INIT_LIST_HEAD(&empty_list);
> +	INIT_LIST_HEAD(&full_list);
> +
> +	spin_lock_irqsave(&barn->lock, flags);
> +
> +	list_splice_init(&barn->sheaves_full, &full_list);
> +	barn->nr_full = 0;
> +	list_splice_init(&barn->sheaves_empty, &empty_list);
> +	barn->nr_empty = 0;
> +
> +	spin_unlock_irqrestore(&barn->lock, flags);
> +
> +	list_for_each_entry_safe(sheaf, sheaf2, &full_list, barn_list) {
> +		sheaf_flush(s, sheaf);
> +		list_move(&sheaf->barn_list, &empty_list);
> +	}

nit: is this list_move() necessary?

> +
> +	list_for_each_entry_safe(sheaf, sheaf2, &empty_list, barn_list)
> +		free_empty_sheaf(s, sheaf);
> +}

Otherwise looks good to me.

-- 
Cheers,
Harry