linux-kernel - Re: [PATCH v2 1/3] mm,slub: Do not special case N_NORMAL nodes for slab

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z_UwPmyxyu8YNLG_@harry>
Date: Tue, 8 Apr 2025 23:18:38 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Oscar Salvador <osalvador@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
        Jonathan Cameron <Jonathan.Cameron@...wei.com>,
        linux-cxl@...r.kernel.org
Subject: Re: [PATCH v2 1/3] mm,slub: Do not special case N_NORMAL nodes for
 slab_nodes

On Tue, Apr 08, 2025 at 12:17:52PM +0200, David Hildenbrand wrote:
> On 08.04.25 10:41, Oscar Salvador wrote:
> > Currently, slab_mem_going_going_callback() checks whether the node has
> > N_NORMAL memory in order to be set in slab_nodes.
> > While it is true that gettind rid of that enforcing would mean
> > ending up with movables nodes in slab_nodes, the memory waste that comes
> > with that is negligible.
> > 
> > So stop checking for status_change_nid_normal and just use status_change_nid
> > instead which works for both types of memory.
> > 
> > Also, once we allocate the kmem_cache_node cache  for the node in
> > slab_mem_online_callback(), we never deallocate it in
> > slab_mem_off_callback() when the node goes memoryless, so we can just
> > get rid of it.
> > 
> > The only side effect is that we will stop clearing the node from slab_nodes.
> > 
> 
> Feel free to add a Suggested-by: if you think it applies.
> 
> 
> Do we have to take care of the N_NORMAL_MEMORY check in kmem_cache_init() ? Likely it
> would have to be a N_MEMORY check.
> 
> 
> But, I was wondering if we could get rid of the "slab_nodes" thingy as a first step?

The following commit says that SLUB has slab_nodes thingy for a reason...
kmem_cache_node might not be ready yet even when N_NORMAL_MEMORY check
says it now has normal memory.

@Vlastimil maybe a dumb question but why not check s->node[nid]
instead of having slab_nodes bitmask?

commit 7e1fa93deff44677a94dfc323ff629bbf5cf9360
Author: Vlastimil Babka <vbabka@...e.cz>
Date:   Wed Feb 24 12:01:12 2021 -0800

    mm, slab, slub: stop taking memory hotplug lock
    
    Since commit 03afc0e25f7f ("slab: get_online_mems for
    kmem_cache_{create,destroy,shrink}") we are taking memory hotplug lock for
    SLAB and SLUB when creating, destroying or shrinking a cache.  It is quite
    a heavy lock and it's best to avoid it if possible, as we had several
    issues with lockdep complaining about ordering in the past, see e.g.
    e4f8e513c3d3 ("mm/slub: fix a deadlock in show_slab_objects()").
    
    The problem scenario in 03afc0e25f7f (solved by the memory hotplug lock)
    can be summarized as follows: while there's slab_mutex synchronizing new
    kmem cache creation and SLUB's MEM_GOING_ONLINE callback
    slab_mem_going_online_callback(), we may miss creation of kmem_cache_node
    for the hotplugged node in the new kmem cache, because the hotplug
    callback doesn't yet see the new cache, and cache creation in
    init_kmem_cache_nodes() only inits kmem_cache_node for nodes in the
    N_NORMAL_MEMORY nodemask, which however may not yet include the new node,
    as that happens only later after the MEM_GOING_ONLINE callback.
    
    Instead of using get/put_online_mems(), the problem can be solved by SLUB
    maintaining its own nodemask of nodes for which it has allocated the
    per-node kmem_cache_node structures.  This nodemask would generally mirror
    the N_NORMAL_MEMORY nodemask, but would be updated only in under SLUB's
    control in its memory hotplug callbacks under the slab_mutex.  This patch
    adds such nodemask and its handling.
    
    Commit 03afc0e25f7f mentiones "issues like [the one above]", but there
    don't appear to be further issues.  All the paths (shared for SLAB and
    SLUB) taking the memory hotplug locks are also taking the slab_mutex,
    except kmem_cache_shrink() where 03afc0e25f7f replaced slab_mutex with
    get/put_online_mems().
    
    We however cannot simply restore slab_mutex in kmem_cache_shrink(), as
    SLUB can enters the function from a write to sysfs 'shrink' file, thus
    holding kernfs lock, and in kmem_cache_create() the kernfs lock is nested
    within slab_mutex.  But on closer inspection we don't actually need to
    protect kmem_cache_shrink() from hotplug callbacks: While SLUB's
    __kmem_cache_shrink() does for_each_kmem_cache_node(), missing a new node
    added in parallel hotplug is not fatal, and parallel hotremove does not
    free kmem_cache_node's anymore after the previous patch, so use-after free
    cannot happen.  The per-node shrinking itself is protected by
    n->list_lock.  Same is true for SLAB, and SLOB is no-op.
    
    SLAB also doesn't need the memory hotplug locking, which it only gained by
    03afc0e25f7f through the shared paths in slab_common.c.  Its memory
    hotplug callbacks are also protected by slab_mutex against races with
    these paths.  The problem of SLUB relying on N_NORMAL_MEMORY doesn't apply
    to SLAB, as its setup_kmem_cache_nodes relies on N_ONLINE, and the new
    node is already set there during the MEM_GOING_ONLINE callback, so no
    special care is needed for SLAB.
    
    As such, this patch removes all get/put_online_mems() usage by the slab
    subsystem.
    
    Link: https://lkml.kernel.org/r/20210113131634.3671-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
    Cc: Christoph Lameter <cl@...ux.com>
    Cc: David Hildenbrand <david@...hat.com>
    Cc: David Rientjes <rientjes@...gle.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@....com>
    Cc: Michal Hocko <mhocko@...nel.org>
    Cc: Pekka Enberg <penberg@...nel.org>
    Cc: Qian Cai <cai@...hat.com>
    Cc: Vladimir Davydov <vdavydov.dev@...il.com>
    Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>

> 
> From 518a2b83a9c5bd85d74ddabbc36ce5d181a88ed6 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@...hat.com>
> Date: Tue, 8 Apr 2025 12:16:13 +0200
> Subject: [PATCH] tmp
> 
> Signed-off-by: David Hildenbrand <david@...hat.com>
> ---
>  mm/slub.c | 56 ++++---------------------------------------------------
>  1 file changed, 4 insertions(+), 52 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index b46f87662e71d..afe31149e7f4e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -445,14 +445,6 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>  	for (__node = 0; __node < nr_node_ids; __node++) \
>  		 if ((__n = get_node(__s, __node)))
> -/*
> - * Tracks for which NUMA nodes we have kmem_cache_nodes allocated.
> - * Corresponds to node_state[N_NORMAL_MEMORY], but can temporarily
> - * differ during memory hotplug/hotremove operations.
> - * Protected by slab_mutex.
> - */
> -static nodemask_t slab_nodes;
> -
>  #ifndef CONFIG_SLUB_TINY
>  /*
>   * Workqueue used for flush_cpu_slab().
> @@ -3706,10 +3698,9 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  	if (!slab) {
>  		/*
>  		 * if the node is not online or has no normal memory, just
> -		 * ignore the node constraint
> +		 * ignore the node constraint.
>  		 */
> -		if (unlikely(node != NUMA_NO_NODE &&
> -			     !node_isset(node, slab_nodes)))
> +		if (unlikely(node != NUMA_NO_NODE && !node_state(node, N_NORMAL_MEMORY)))
>  			node = NUMA_NO_NODE;
>  		goto new_slab;
>  	}
> @@ -3719,7 +3710,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  		 * same as above but node_match() being false already
>  		 * implies node != NUMA_NO_NODE
>  		 */
> -		if (!node_isset(node, slab_nodes)) {
> +		if (!node_state(node, N_NORMAL_MEMORY)) {
>  			node = NUMA_NO_NODE;
>  		} else {
>  			stat(s, ALLOC_NODE_MISMATCH);
> @@ -5623,7 +5614,7 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
>  {
>  	int node;
> -	for_each_node_mask(node, slab_nodes) {
> +	for_each_node_state(node, N_NORMAL_MEMORY) {
>  		struct kmem_cache_node *n;
>  		if (slab_state == DOWN) {
> @@ -6164,30 +6155,6 @@ static int slab_mem_going_offline_callback(void *arg)
>  	return 0;
>  }
> -static void slab_mem_offline_callback(void *arg)
> -{
> -	struct memory_notify *marg = arg;
> -	int offline_node;
> -
> -	offline_node = marg->status_change_nid_normal;
> -
> -	/*
> -	 * If the node still has available memory. we need kmem_cache_node
> -	 * for it yet.
> -	 */
> -	if (offline_node < 0)
> -		return;
> -
> -	mutex_lock(&slab_mutex);
> -	node_clear(offline_node, slab_nodes);
> -	/*
> -	 * We no longer free kmem_cache_node structures here, as it would be
> -	 * racy with all get_node() users, and infeasible to protect them with
> -	 * slab_mutex.
> -	 */
> -	mutex_unlock(&slab_mutex);
> -}
> -
>  static int slab_mem_going_online_callback(void *arg)
>  {
>  	struct kmem_cache_node *n;
> @@ -6229,11 +6196,6 @@ static int slab_mem_going_online_callback(void *arg)
>  		init_kmem_cache_node(n);
>  		s->node[nid] = n;
>  	}
> -	/*
> -	 * Any cache created after this point will also have kmem_cache_node
> -	 * initialized for the new node.
> -	 */
> -	node_set(nid, slab_nodes);
>  out:
>  	mutex_unlock(&slab_mutex);
>  	return ret;
> @@ -6253,8 +6215,6 @@ static int slab_memory_callback(struct notifier_block *self,
>  		break;
>  	case MEM_OFFLINE:
>  	case MEM_CANCEL_ONLINE:
> -		slab_mem_offline_callback(arg);
> -		break;
>  	case MEM_ONLINE:
>  	case MEM_CANCEL_OFFLINE:
>  		break;
> @@ -6309,7 +6269,6 @@ void __init kmem_cache_init(void)
>  {
>  	static __initdata struct kmem_cache boot_kmem_cache,
>  		boot_kmem_cache_node;
> -	int node;
>  	if (debug_guardpage_minorder())
>  		slub_max_order = 0;
> @@ -6321,13 +6280,6 @@ void __init kmem_cache_init(void)
>  	kmem_cache_node = &boot_kmem_cache_node;
>  	kmem_cache = &boot_kmem_cache;
> -	/*
> -	 * Initialize the nodemask for which we will allocate per node
> -	 * structures. Here we don't need taking slab_mutex yet.
> -	 */
> -	for_each_node_state(node, N_NORMAL_MEMORY)
> -		node_set(node, slab_nodes);
> -
>  	create_boot_cache(kmem_cache_node, "kmem_cache_node",
>  			sizeof(struct kmem_cache_node),
>  			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
> -- 
> 2.48.1
> 
> 
> Not sure if there are any races to consider ... just an idea.
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 

-- 
Cheers,
Harry (formerly known as Hyeonggon)