lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2bec8b53-f788-493e-a76e-1f804ed3aa0c@redhat.com>
Date: Tue, 10 Jun 2025 10:10:21 +0200
From: David Hildenbrand <david@...hat.com>
To: Oscar Salvador <osalvador@...e.de>,
 Andrew Morton <akpm@...ux-foundation.org>
Cc: Vlastimil Babka <vbabka@...e.cz>,
 Jonathan Cameron <Jonathan.Cameron@...wei.com>,
 Harry Yoo <harry.yoo@...cle.com>, Rakie Kim <rakie.kim@...com>,
 Hyeonggon Yoo <42.hyeyoo@...il.com>, Joshua Hahn <joshua.hahnjy@...il.com>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier

On 09.06.25 11:21, Oscar Salvador wrote:
> There are at least six consumers of hotplug_memory_notifier that what they
> really are interested in is whether any numa node changed its state, e.g: going
> from having memory to not having memory and vice versa.
> 
> Implement a specific notifier for numa nodes when their state gets changed,
> which will later be used by those consumers that are only interested
> in numa node state changes.
> 
> Add documentation as well.
> 
> Signed-off-by: Oscar Salvador <osalvador@...e.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> Reviewed-by: Harry Yoo <harry.yoo@...cle.com>
> Reviewed-by: Vlastimil Babka <vbabka@...e.cz>
> ---
>   Documentation/core-api/memory-hotplug.rst |  66 +++++++++
>   drivers/base/node.c                       |  21 +++
>   include/linux/node.h                      |  40 ++++++
>   mm/memory_hotplug.c                       | 155 ++++++++++------------
>   4 files changed, 200 insertions(+), 82 deletions(-)
> 
> diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
> index d1b8eb9add8a..b19c3be7437d 100644
> --- a/Documentation/core-api/memory-hotplug.rst
> +++ b/Documentation/core-api/memory-hotplug.rst
> @@ -9,6 +9,9 @@ Memory hotplug event notifier
>   
>   Hotplugging events are sent to a notification queue.
>   
> +Memory notifier
> +----------------
> +
>   There are six types of notification defined in ``include/linux/memory.h``:
>   
>   MEM_GOING_ONLINE
> @@ -80,6 +83,69 @@ further processing of the notification queue.
>   
>   NOTIFY_STOP stops further processing of the notification queue.
>   
> +Numa node notifier
> +------------------
> +
> +There are six types of notification defined in ``include/linux/node.h``:
> +
> +NODE_ADDING_FIRST_MEMORY
> + Generated before memory becomes available to this node for the first time.
> +
> +NODE_CANCEL_ADDING_FIRST_MEMORY
> + Generated if NODE_ADDING_FIRST_MEMORY fails.
> +
> +NODE_ADDED_FIRST_MEMORY
> + Generated when memory has become available fo this node for the first time.
> +
> +NODE_REMOVING_LAST_MEMORY
> + Generated when the last memory available to this node is about to be offlined.
> +
> +NODE_CANCEL_REMOVING_LAST_MEMORY
> + Generated when NODE_CANCEL_REMOVING_LAST_MEMORY fails.
> +
> +NODE_REMOVED_LAST_MEMORY
> + Generated when the last memory available to this node has been offlined.
> +
> +A callback routine can be registered by calling::
> +
> +  hotplug_node_notifier(callback_func, priority)
> +
> +Callback functions with higher values of priority are called before callback
> +functions with lower values.
> +
> +A callback function must have the following prototype::
> +
> +  int callback_func(
> +
> +    struct notifier_block *self, unsigned long action, void *arg);
> +
> +The first argument of the callback function (self) is a pointer to the block
> +of the notifier chain that points to the callback function itself.
> +The second argument (action) is one of the event types described above.
> +The third argument (arg) passes a pointer of struct node_notify::
> +
> +        struct node_notify {
> +                int nid;
> +        }
> +
> +- nid is the node we are adding or removing memory to.
> +
> +  If nid >= 0, callback should create/discard structures for the
> +  node if necessary.

Likely that should be removed?

It' probably worth mentioning that one might get notified about 
NODE_CANCEL_ADDING_FIRST_MEMORY even though never notified for 
NODE_ADDING_FIRST_MEMORY. (same for removing)

I recall this can happen if one of the NODE_ADDING_FIRST_MEMORY 
notifiers fails.

(same applies to MEM_CANCEL_*)

Consequently, we might simplify the cancel_mem_notifier_on_err etc 
stuff, simply unconditionally calling the cancel counterparts.

> +
> +The callback routine shall return one of the values
> +NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
> +defined in ``include/linux/notifier.h``
> +
> +NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
> +
> +NOTIFY_BAD is used as response to the NODE_ADDING_FIRST_MEMORY,
> +NODE_REMOVING_LAST_MEMORY, NODE_ADDED_FIRST_MEMORY or
> +NODE_REMOVED_LAST_MEMORY action to cancel hotplugging.
> +It stops further processing of the notification queue.
> +
> +NOTIFY_STOP stops further processing of the notification queue.

Should we docunment that failing NODE_ADDED_FIRST_MEMORY / 
NODE_REMOVED_FIRST_MEMORY is very bad?

> +
>   Locking Internals
>   =================
>   
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 25ab9ec14eb8..c5b0859d846d 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -111,6 +111,27 @@ static const struct attribute_group *node_access_node_groups[] = {
>   	NULL,
>   };
>   
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +static BLOCKING_NOTIFIER_HEAD(node_chain);
> +
> +int register_node_notifier(struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_register(&node_chain, nb);
> +}
> +EXPORT_SYMBOL(register_node_notifier);
> +
> +void unregister_node_notifier(struct notifier_block *nb)
> +{
> +	blocking_notifier_chain_unregister(&node_chain, nb);
> +}
> +EXPORT_SYMBOL(unregister_node_notifier);
> +
> +int node_notify(unsigned long val, void *v)
> +{
> +	return blocking_notifier_call_chain(&node_chain, val, v);
> +}
> +#endif
> +
>   static void node_remove_accesses(struct node *node)
>   {
>   	struct node_access_nodes *c, *cnext;
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 2b7517892230..d7aa2636d948 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -123,6 +123,46 @@ static inline void register_memory_blocks_under_node(int nid, unsigned long star
>   #endif
>   
>   extern void unregister_node(struct node *node);
> +
> +struct node_notify {
> +	int nid;
> +};
> +
> +#define NODE_ADDING_FIRST_MEMORY                (1<<0)
> +#define NODE_ADDED_FIRST_MEMORY                 (1<<1)
> +#define NODE_CANCEL_ADDING_FIRST_MEMORY         (1<<2)
> +#define NODE_REMOVING_LAST_MEMORY               (1<<3)
> +#define NODE_REMOVED_LAST_MEMORY                (1<<4)
> +#define NODE_CANCEL_REMOVING_LAST_MEMORY        (1<<5)
> +
> +#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
> +extern int register_node_notifier(struct notifier_block *nb);
> +extern void unregister_node_notifier(struct notifier_block *nb);
> +extern int node_notify(unsigned long val, void *v);
> +
> +#define hotplug_node_notifier(fn, pri) ({		\
> +	static __meminitdata struct notifier_block fn##_node_nb =\
> +		{ .notifier_call = fn, .priority = pri };\
> +	register_node_notifier(&fn##_node_nb);			\
> +})
> +#else
> +static inline int register_node_notifier(struct notifier_block *nb)
> +{
> +	return 0;
> +}
> +static inline void unregister_node_notifier(struct notifier_block *nb)
> +{
> +}
> +static inline int node_notify(unsigned long val, void *v)
> +{
> +	return 0;
> +}
> +static inline int hotplug_node_notifier(notifier_fn_t fn, int pri)
> +{
> +	return 0;
> +}
> +#endif
> +
>   #ifdef CONFIG_NUMA
>   extern void node_dev_init(void);
>   /* Core of the node registration - only memory hotplug should use this */
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 94ae0ca37021..0550f3061fc4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -35,6 +35,7 @@
>   #include <linux/compaction.h>
>   #include <linux/rmap.h>
>   #include <linux/module.h>
> +#include <linux/node.h>
>   
>   #include <asm/tlbflush.h>
>   
> @@ -699,24 +700,6 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
>   	online_mem_sections(start_pfn, end_pfn);
>   }
>   
> -/* check which state of node_states will be changed when online memory */
> -static void node_states_check_changes_online(unsigned long nr_pages,
> -	struct zone *zone, struct memory_notify *arg)
> -{
> -	int nid = zone_to_nid(zone);
> -
> -	arg->status_change_nid = NUMA_NO_NODE;
> -
> -	if (!node_state(nid, N_MEMORY))
> -		arg->status_change_nid = nid;
> -}
> -
> -static void node_states_set_node(int node, struct memory_notify *arg)
> -{
> -	if (arg->status_change_nid >= 0)
> -		node_set_state(node, N_MEMORY);
> -}
> -
>   static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn,
>   		unsigned long nr_pages)
>   {
> @@ -1171,7 +1154,9 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
>   	int need_zonelists_rebuild = 0;
>   	const int nid = zone_to_nid(zone);
>   	int ret;
> -	struct memory_notify arg;
> +	struct memory_notify mem_arg;
> +	struct node_notify node_arg;
> +	bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;

Prefer reverse xmas tree ... :)

[...]

> -
>   static int count_system_ram_pages_cb(unsigned long start_pfn,
>   				     unsigned long nr_pages, void *data)
>   {
> @@ -1937,13 +1899,17 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
>   int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>   			struct zone *zone, struct memory_group *group)
>   {
> -	const unsigned long end_pfn = start_pfn + nr_pages;
> -	unsigned long pfn, managed_pages, system_ram_pages = 0;
> -	const int node = zone_to_nid(zone);
> -	unsigned long flags;
> -	struct memory_notify arg;
> -	char *reason;
>   	int ret;
> +	char *reason;
> +	enum zone_type zt;
> +	unsigned long flags;
> +	struct memory_notify mem_arg;
> +	struct node_notify node_arg;
> +	const int node = zone_to_nid(zone);
> +	struct pglist_data *pgdat = zone->zone_pgdat;
> +	const unsigned long end_pfn = start_pfn + nr_pages;
> +	unsigned long pfn, managed_pages, system_ram_pages = 0, present_pages = 0;
> +	bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;


You'r now reversing the reversed christmas tree :)

>   
>   	/*
>   	 * {on,off}lining is constrained to full memory sections (or more
> @@ -2000,11 +1966,30 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>   		goto failed_removal_pcplists_disabled;
>   	}
>   
> -	arg.start_pfn = start_pfn;
> -	arg.nr_pages = nr_pages;
> -	node_states_check_changes_offline(nr_pages, zone, &arg);
> +	/*
> +	 * Here we count the possible pages within the range [0..ZONE_MOVABLE].
> +	 * If after having accounted all the pages, we see that the nr_pages to
> +	 * be offlined is greater or equal to the accounted pages, we know that the
> +	 * node will become empty, and so, we will clear N_MEMORY for it.
> +	 */
> +	node_arg.nid = NUMA_NO_NODE;
> +	for (zt = 0; zt <= ZONE_MOVABLE; zt++)
> +		present_pages += pgdat->node_zones[zt].present_pages;

Why can't we look at node_present_pages?

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ