linux-kernel - Re: [PATCH v2] mm: Only re-generate demotion targets when a numa node changes its N

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YisTwzumn3tgL9H4@localhost.localdomain>
Date:   Fri, 11 Mar 2022 10:17:55 +0100
From:   Oscar Salvador <osalvador@...e.de>
To:     "Huang, Ying" <ying.huang@...el.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Abhishek Goel <huntbag@...ux.vnet.ibm.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm: Only re-generate demotion targets when a numa
 node changes its N_CPU state

On Fri, Mar 11, 2022 at 01:06:26PM +0800, Huang, Ying wrote:
> Oscar Salvador <osalvador@...e.de> writes:
> > -static int __init migrate_on_reclaim_init(void)
> > -{
> > -	int ret;
> > -
> >  	node_demotion = kmalloc_array(nr_node_ids,
> >  				      sizeof(struct demotion_nodes),
> >  				      GFP_KERNEL);
> >  	WARN_ON(!node_demotion);
> >  
> > -	ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline",
> > -					NULL, migration_offline_cpu);
> >  	/*
> > -	 * In the unlikely case that this fails, the automatic
> > -	 * migration targets may become suboptimal for nodes
> > -	 * where N_CPU changes.  With such a small impact in a
> > -	 * rare case, do not bother trying to do anything special.
> > +	 * At this point, all numa nodes with memory/CPus have their state
> > +	 * properly set, so we can build the demotion order now.
> >  	 */
> > -	WARN_ON(ret < 0);
> > -	ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online",
> > -				migration_online_cpu, NULL);
> > -	WARN_ON(ret < 0);
> > -
> > +	set_migration_target_nodes();
> 
> If my understanding were correct, we should enclose
> set_migration_target_nodes() here with cpus_read_lock().  And add some
> comment before set_migration_target_nodes() for this.  I don't know
> whether the locking order is right.

Oh, I see that cpuhp_setup_state() holds the cpu-hotplug lock while
calling in, so yeah, we might want to hold in there.

The thing is, not long ago we found out that we could have ACPI events
like memory-hotplug operations at boot stage [1], so I guess it is
safe to assume we could also have cpu-hotplug operations at that stage
as well, and so we want to hold cpus_read_lock() just to be on the safe
side.

But, unless I am missing something, that does not apply to
set_migration_target_nodes() being called from a callback,
as the callback (somewhere up the chain) already holds that lock.
e.g: (_cpu_up takes cpus_write_lock()) and the same for the down
operation.

So, to sum it up, we only need the cpus_read_lock() in
migrate_on_reclaim_init().

> >  	hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
> 
> And we should register the notifier before calling set_migration_target_nodes()?

I cannot made my mind here.
The primary reason I placed the call before registering the notifier is
because the original code called set_migration_target_nodes() before
doing so:

<--
ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online",
			migration_online_cpu, NULL);
WARN_ON(ret < 0);

hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
-->

I thought about following the same line. Why do you think it should be
called afterwards?

I am not really sure whether it has a different impact depending on the
order.
Note that memory-hotplug acpi events can happen at boot time, so by the
time we register the memory_hotplug notifier, we can have some hotplug
memory coming in, and so we call set_migration_target_nodes().

But that is fine, and I cannot see a difference shufling the order
of them. 
Do you see a problem in there?

[1] https://patchwork.kernel.org/project/linux-mm/patch/20200915094143.79181-3-ldufour@linux.ibm.com/

-- 
Oscar Salvador
SUSE Labs