[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e29293fd-b81a-4133-800a-875dcd252d67@redhat.com>
Date: Fri, 14 Mar 2025 10:17:26 +0100
From: David Hildenbrand <david@...hat.com>
To: Rakie Kim <rakie.kim@...com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-cxl@...r.kernel.org,
joshua.hahnjy@...il.com, dan.j.williams@...el.com,
ying.huang@...ux.alibaba.com, kernel_team@...ynix.com, honggyu.kim@...com,
yunjeong.mun@...com, Gregory Price <gourry@...rry.net>
Subject: Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted
interleave
> Hi David,
Hi :)
>
> I am currently working on adding memory hotplug-related functionality
> to the weighted interleave feature. While discussing this with Gregory,
> a question came up. If you happen to know the answer to the following,
> I would greatly appreciate your input.
>
> I have added the following logic to call add_weight_node when a node
> transitions to the N_MEMORY state to create a sysfs entry. Conversely,
> when all memory blocks of a node go offline (!N_MEMORY),
> I call sysfs_wi_node_release to remove the corresponding sysfs entry.
>
As a spoiler: I don't like how we squeezed the status_change_nid /
status_change_nid_normal stuff into the memory notifier that works on a
single memory block -> single zone. But decoupling it is not as easy,
because we have this status_change_nid vs. status_change_nid_normal thing.
For the basic "node going offline / node going online", a separate
notifier (or separate callbacks) would make at least your use case much
clearer.
The whole "status_change_nid_normal" is only used by slab. I wonder if
we could get rid of it, and simply let slab check the relevant
zone->present pages when notified about onlining/offlining of a singe
memory block.
Then, we would only have status_change_nid and could just convert that
to a NODE_MEM_ONLINE / NODE_MEM_OFFLINE notifier or sth like that.
Hmmm, if I wouldn't be on PTO, I would prototype that real quick :)
> +static int wi_node_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + int err;
> + struct memory_notify *arg = data;
> + int nid = arg->status_change_nid;
> +
> + if (nid < 0)
> + goto notifier_end;
> +
> + switch(action) {
> + case MEM_ONLINE:
> + err = add_weight_node(nid, wi_kobj);
> + if (err) {
> + pr_err("failed to add sysfs [node%d]\n", nid);
> + kobject_put(wi_kobj);
> + return NOTIFY_BAD;
> + }
> + break;
> + case MEM_OFFLINE:
> + sysfs_wi_node_release(node_attrs[nid], wi_kobj);
> + break;
> + }
> +
> +notifier_end:
> + return NOTIFY_OK;
> +}
>
> One question I have is whether the MEM_OFFLINE action in the code
> below will be triggered when a node that consists of multiple memory
> blocks has only one of its memory blocks transitioning to the offline state.
>
node_states_check_changes_offline() should be making sure that that is
the case.
Only if offlining the current block will make the node (all zones) have
no present pages any more, then we'll set status_change_nid to >= 0.
> + int nid = arg->status_change_nid;
> +
> + if (nid < 0)
> + goto notifier_end;
>
> Based on my analysis, wi_node_notifier should function as expected
> because arg->status_change_nid only holds a non-negative value
> under the following conditions:
>
> 1) !N_MEMORY -> N_MEMORY
> When the first memory block of a node transitions to the online state,
> it holds a non-negative value.
> In all other cases, it remains -1 (NUMA_NO_NODE).
>
> 2) N_MEMORY -> !N_MEMORY
> When all memory blocks of a node transition to the offline state,
> it holds a non-negative value.
> In all other cases, it remains -1 (NUMA_NO_NODE).
>
> I would truly appreciate it if you could confirm whether this analysis is correct.
> Below is a more detailed explanation of my findings.
>
Yes, that's at least how it is supposed to work (-bugs, but I am not
aware of any) :)
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists