linux-kernel - Re: [PATCH v2 1/3] mm,slub: Do not special case N_NORMAL nodes for slab

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d581417f-7756-4ce7-8a5a-49149db33b8c@suse.cz>
Date: Tue, 8 Apr 2025 19:55:40 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: David Hildenbrand <david@...hat.com>, Harry Yoo <harry.yoo@...cle.com>
Cc: Oscar Salvador <osalvador@...e.de>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, Jonathan Cameron
 <Jonathan.Cameron@...wei.com>, linux-cxl@...r.kernel.org
Subject: Re: [PATCH v2 1/3] mm,slub: Do not special case N_NORMAL nodes for
 slab_nodes

On 4/8/25 16:25, David Hildenbrand wrote:
> On 08.04.25 16:18, Harry Yoo wrote:
>> On Tue, Apr 08, 2025 at 12:17:52PM +0200, David Hildenbrand wrote:
>>> On 08.04.25 10:41, Oscar Salvador wrote:
>>>> Currently, slab_mem_going_going_callback() checks whether the node has
>>>> N_NORMAL memory in order to be set in slab_nodes.
>>>> While it is true that gettind rid of that enforcing would mean
>>>> ending up with movables nodes in slab_nodes, the memory waste that comes
>>>> with that is negligible.
>>>>
>>>> So stop checking for status_change_nid_normal and just use status_change_nid
>>>> instead which works for both types of memory.
>>>>
>>>> Also, once we allocate the kmem_cache_node cache  for the node in
>>>> slab_mem_online_callback(), we never deallocate it in
>>>> slab_mem_off_callback() when the node goes memoryless, so we can just
>>>> get rid of it.
>>>>
>>>> The only side effect is that we will stop clearing the node from slab_nodes.
>>>>
>>>
>>> Feel free to add a Suggested-by: if you think it applies.
>>>
>>>
>>> Do we have to take care of the N_NORMAL_MEMORY check in kmem_cache_init() ? Likely it
>>> would have to be a N_MEMORY check.
>>>
>>>
>>> But, I was wondering if we could get rid of the "slab_nodes" thingy as a first step?
>> 
>> The following commit says that SLUB has slab_nodes thingy for a reason...
>> kmem_cache_node might not be ready yet even when N_NORMAL_MEMORY check
>> says it now has normal memory.
> 
> node_states_set_node() is called from memory hotplug code after 
> MEM_GOING_ONLINE and after online_pages_range().
> 
> Pages might be isolated at that point, but node_states_set_node() is set 
> only after the memory notifier (MEM_GOING_ONLINE) was triggered.
> 
> So I don't immediately see the problem assuming that we never free the 
> structures.

Hm, I think it still exists. So consider:

- slab_mem_going_online_callback() creates kmem_cache_node for the new node
for existing caches under the mutex

Then a cache creation races with "node_states_set_node() is set only after
the memory notifier (MEM_GOING_ONLINE) was triggered" in a way that it
doesn't see the node set yet - kmem_cache_node for the new node on the new
cache will be missing.

The root issue is node_states_set_node() doesn't happen under slab_mutex.
slab_nodes update happens under the slab_mutex to avoid this race.

And once we have slab_nodes for this cache creation vs hotplug
synchronization then it's also to use it for the ___slab_alloc() checks,
although they could indeed now use node_state(node, N_NORMAL_MEMORY) once we
create the node structures and set bits in slab_nodes for N_MEMORY.

Maybe it could be solved at the memory hotplug level if it we had a new
state e.g.  N_GOING_ONLINE that was set before calling MEM_GOING_ONLINE
callbacks and was never reset. Then init_kmem_cache_nodes() could iterate on
that. But I'm not sure if slab_mutex would provide necessary synchronization
guarantees here so that the addition of node to MEM_GOING_ONLINE can't be
missed, hm.

> But yeah, this is what I raised below: "Not sure if there are any races 
> to consider" :)
> 
>> 
>> @Vlastimil maybe a dumb question but why not check s->node[nid]
>> instead of having slab_nodes bitmask?

You mean check i.e. in ___slab_alloc()? We'd still be prone to miss a
kmem_cache_node creation and degrade the particular cache that way, we'd
just avoid a NULL pointer dereference.