[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b26b32c9-6b3a-4ab4-9ef4-c20b415d5483@redhat.com>
Date: Tue, 8 Apr 2025 16:25:33 +0200
From: David Hildenbrand <david@...hat.com>
To: Harry Yoo <harry.yoo@...cle.com>
Cc: Oscar Salvador <osalvador@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>, linux-cxl@...r.kernel.org
Subject: Re: [PATCH v2 1/3] mm,slub: Do not special case N_NORMAL nodes for
slab_nodes
On 08.04.25 16:18, Harry Yoo wrote:
> On Tue, Apr 08, 2025 at 12:17:52PM +0200, David Hildenbrand wrote:
>> On 08.04.25 10:41, Oscar Salvador wrote:
>>> Currently, slab_mem_going_going_callback() checks whether the node has
>>> N_NORMAL memory in order to be set in slab_nodes.
>>> While it is true that gettind rid of that enforcing would mean
>>> ending up with movables nodes in slab_nodes, the memory waste that comes
>>> with that is negligible.
>>>
>>> So stop checking for status_change_nid_normal and just use status_change_nid
>>> instead which works for both types of memory.
>>>
>>> Also, once we allocate the kmem_cache_node cache for the node in
>>> slab_mem_online_callback(), we never deallocate it in
>>> slab_mem_off_callback() when the node goes memoryless, so we can just
>>> get rid of it.
>>>
>>> The only side effect is that we will stop clearing the node from slab_nodes.
>>>
>>
>> Feel free to add a Suggested-by: if you think it applies.
>>
>>
>> Do we have to take care of the N_NORMAL_MEMORY check in kmem_cache_init() ? Likely it
>> would have to be a N_MEMORY check.
>>
>>
>> But, I was wondering if we could get rid of the "slab_nodes" thingy as a first step?
>
> The following commit says that SLUB has slab_nodes thingy for a reason...
> kmem_cache_node might not be ready yet even when N_NORMAL_MEMORY check
> says it now has normal memory.
node_states_set_node() is called from memory hotplug code after
MEM_GOING_ONLINE and after online_pages_range().
Pages might be isolated at that point, but node_states_set_node() is set
only after the memory notifier (MEM_GOING_ONLINE) was triggered.
So I don't immediately see the problem assuming that we never free the
structures.
But yeah, this is what I raised below: "Not sure if there are any races
to consider" :)
>
> @Vlastimil maybe a dumb question but why not check s->node[nid]
> instead of having slab_nodes bitmask?
>
> commit 7e1fa93deff44677a94dfc323ff629bbf5cf9360
> Author: Vlastimil Babka <vbabka@...e.cz>
> Date: Wed Feb 24 12:01:12 2021 -0800
>
> mm, slab, slub: stop taking memory hotplug lock
>
> Since commit 03afc0e25f7f ("slab: get_online_mems for
> kmem_cache_{create,destroy,shrink}") we are taking memory hotplug lock for
> SLAB and SLUB when creating, destroying or shrinking a cache. It is quite
> a heavy lock and it's best to avoid it if possible, as we had several
> issues with lockdep complaining about ordering in the past, see e.g.
> e4f8e513c3d3 ("mm/slub: fix a deadlock in show_slab_objects()").
>
> The problem scenario in 03afc0e25f7f (solved by the memory hotplug lock)
> can be summarized as follows: while there's slab_mutex synchronizing new
> kmem cache creation and SLUB's MEM_GOING_ONLINE callback
> slab_mem_going_online_callback(), we may miss creation of kmem_cache_node
> for the hotplugged node in the new kmem cache, because the hotplug
> callback doesn't yet see the new cache, and cache creation in
> init_kmem_cache_nodes() only inits kmem_cache_node for nodes in the
> N_NORMAL_MEMORY nodemask, which however may not yet include the new node,
> as that happens only later after the MEM_GOING_ONLINE callback.
>
> Instead of using get/put_online_mems(), the problem can be solved by SLUB
> maintaining its own nodemask of nodes for which it has allocated the
> per-node kmem_cache_node structures. This nodemask would generally mirror
> the N_NORMAL_MEMORY nodemask, but would be updated only in under SLUB's
> control in its memory hotplug callbacks under the slab_mutex. This patch
> adds such nodemask and its handling.
>
> Commit 03afc0e25f7f mentiones "issues like [the one above]", but there
> don't appear to be further issues. All the paths (shared for SLAB and
> SLUB) taking the memory hotplug locks are also taking the slab_mutex,
> except kmem_cache_shrink() where 03afc0e25f7f replaced slab_mutex with
> get/put_online_mems().
>
> We however cannot simply restore slab_mutex in kmem_cache_shrink(), as
> SLUB can enters the function from a write to sysfs 'shrink' file, thus
> holding kernfs lock, and in kmem_cache_create() the kernfs lock is nested
> within slab_mutex. But on closer inspection we don't actually need to
> protect kmem_cache_shrink() from hotplug callbacks: While SLUB's
> __kmem_cache_shrink() does for_each_kmem_cache_node(), missing a new node
> added in parallel hotplug is not fatal, and parallel hotremove does not
> free kmem_cache_node's anymore after the previous patch, so use-after free
> cannot happen. The per-node shrinking itself is protected by
> n->list_lock. Same is true for SLAB, and SLOB is no-op.
>
> SLAB also doesn't need the memory hotplug locking, which it only gained by
> 03afc0e25f7f through the shared paths in slab_common.c. Its memory
> hotplug callbacks are also protected by slab_mutex against races with
> these paths. The problem of SLUB relying on N_NORMAL_MEMORY doesn't apply
> to SLAB, as its setup_kmem_cache_nodes relies on N_ONLINE, and the new
> node is already set there during the MEM_GOING_ONLINE callback, so no
> special care is needed for SLAB.
>
> As such, this patch removes all get/put_online_mems() usage by the slab
> subsystem.
>
> Link: https://lkml.kernel.org/r/20210113131634.3671-3-vbabka@suse.cz
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
> Cc: Christoph Lameter <cl@...ux.com>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: David Rientjes <rientjes@...gle.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@....com>
> Cc: Michal Hocko <mhocko@...nel.org>
> Cc: Pekka Enberg <penberg@...nel.org>
> Cc: Qian Cai <cai@...hat.com>
> Cc: Vladimir Davydov <vdavydov.dev@...il.com>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
>
>>
>> From 518a2b83a9c5bd85d74ddabbc36ce5d181a88ed6 Mon Sep 17 00:00:00 2001
>> From: David Hildenbrand <david@...hat.com>
>> Date: Tue, 8 Apr 2025 12:16:13 +0200
>> Subject: [PATCH] tmp
>>
>> Signed-off-by: David Hildenbrand <david@...hat.com>
>> ---
>> mm/slub.c | 56 ++++---------------------------------------------------
>> 1 file changed, 4 insertions(+), 52 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index b46f87662e71d..afe31149e7f4e 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -445,14 +445,6 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>> for (__node = 0; __node < nr_node_ids; __node++) \
>> if ((__n = get_node(__s, __node)))
>> -/*
>> - * Tracks for which NUMA nodes we have kmem_cache_nodes allocated.
>> - * Corresponds to node_state[N_NORMAL_MEMORY], but can temporarily
>> - * differ during memory hotplug/hotremove operations.
>> - * Protected by slab_mutex.
>> - */
>> -static nodemask_t slab_nodes;
>> -
>> #ifndef CONFIG_SLUB_TINY
>> /*
>> * Workqueue used for flush_cpu_slab().
>> @@ -3706,10 +3698,9 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> if (!slab) {
>> /*
>> * if the node is not online or has no normal memory, just
>> - * ignore the node constraint
>> + * ignore the node constraint.
>> */
>> - if (unlikely(node != NUMA_NO_NODE &&
>> - !node_isset(node, slab_nodes)))
>> + if (unlikely(node != NUMA_NO_NODE && !node_state(node, N_NORMAL_MEMORY)))
>> node = NUMA_NO_NODE;
>> goto new_slab;
>> }
>> @@ -3719,7 +3710,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> * same as above but node_match() being false already
>> * implies node != NUMA_NO_NODE
>> */
>> - if (!node_isset(node, slab_nodes)) {
>> + if (!node_state(node, N_NORMAL_MEMORY)) {
>> node = NUMA_NO_NODE;
>> } else {
>> stat(s, ALLOC_NODE_MISMATCH);
>> @@ -5623,7 +5614,7 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
>> {
>> int node;
>> - for_each_node_mask(node, slab_nodes) {
>> + for_each_node_state(node, N_NORMAL_MEMORY) {
>> struct kmem_cache_node *n;
>> if (slab_state == DOWN) {
>> @@ -6164,30 +6155,6 @@ static int slab_mem_going_offline_callback(void *arg)
>> return 0;
>> }
>> -static void slab_mem_offline_callback(void *arg)
>> -{
>> - struct memory_notify *marg = arg;
>> - int offline_node;
>> -
>> - offline_node = marg->status_change_nid_normal;
>> -
>> - /*
>> - * If the node still has available memory. we need kmem_cache_node
>> - * for it yet.
>> - */
>> - if (offline_node < 0)
>> - return;
>> -
>> - mutex_lock(&slab_mutex);
>> - node_clear(offline_node, slab_nodes);
>> - /*
>> - * We no longer free kmem_cache_node structures here, as it would be
>> - * racy with all get_node() users, and infeasible to protect them with
>> - * slab_mutex.
>> - */
>> - mutex_unlock(&slab_mutex);
>> -}
>> -
>> static int slab_mem_going_online_callback(void *arg)
>> {
>> struct kmem_cache_node *n;
>> @@ -6229,11 +6196,6 @@ static int slab_mem_going_online_callback(void *arg)
>> init_kmem_cache_node(n);
>> s->node[nid] = n;
>> }
>> - /*
>> - * Any cache created after this point will also have kmem_cache_node
>> - * initialized for the new node.
>> - */
>> - node_set(nid, slab_nodes);
>> out:
>> mutex_unlock(&slab_mutex);
>> return ret;
>> @@ -6253,8 +6215,6 @@ static int slab_memory_callback(struct notifier_block *self,
>> break;
>> case MEM_OFFLINE:
>> case MEM_CANCEL_ONLINE:
>> - slab_mem_offline_callback(arg);
>> - break;
>> case MEM_ONLINE:
>> case MEM_CANCEL_OFFLINE:
>> break;
>> @@ -6309,7 +6269,6 @@ void __init kmem_cache_init(void)
>> {
>> static __initdata struct kmem_cache boot_kmem_cache,
>> boot_kmem_cache_node;
>> - int node;
>> if (debug_guardpage_minorder())
>> slub_max_order = 0;
>> @@ -6321,13 +6280,6 @@ void __init kmem_cache_init(void)
>> kmem_cache_node = &boot_kmem_cache_node;
>> kmem_cache = &boot_kmem_cache;
>> - /*
>> - * Initialize the nodemask for which we will allocate per node
>> - * structures. Here we don't need taking slab_mutex yet.
>> - */
>> - for_each_node_state(node, N_NORMAL_MEMORY)
>> - node_set(node, slab_nodes);
>> -
>> create_boot_cache(kmem_cache_node, "kmem_cache_node",
>> sizeof(struct kmem_cache_node),
>> SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
>> --
>> 2.48.1
>>
>>
>> Not sure if there are any races to consider ... just an idea.
>>
>> --
>> Cheers,
>>
>> David / dhildenb
>>
>
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists