linux-kernel - Re: [PATCH -V11 2/9] mm/migrate: update node demotion order on hotplug events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87pmnb3ccr.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Fri, 25 Feb 2022 10:32:20 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Abhishek Goel <huntbag@...ux.vnet.ibm.com>
Cc:     Dave Hansen <dave.hansen@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Zi Yan <ziy@...dia.com>,
        David Hildenbrand <david@...hat.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH -V11 2/9] mm/migrate: update node demotion order on
 hotplug events

Hi, Abhishek,

Abhishek Goel <huntbag@...ux.vnet.ibm.com> writes:

> On 24/02/22 05:35, Dave Hansen wrote:
>> On 2/23/22 15:02, Abhishek Goel wrote:
>>> If needed, I will provide experiment results and traces that were used
>>> to conclude this.
>> It would be great if you can provide some more info.  Even just a CPU
>> time profile would be helpful.
>
> Average total time taken for SMT=8 to SMT=1 in v5.14 : 20s
>
> Average total time taken for SMT=8 to SMT=1 in v5.15 : 36s
>
> (Observed in system with 150+ CPUs )

We have run into a memory hotplug regression before.  Let's check
whether the problem is similar.  Can you try the below debug patch?

Best Regards,
Huang, Ying

----------------------------8<------------------------------------------
>From 500c0b53436b7a697ed5d77241abbc0d5d3cfc07 Mon Sep 17 00:00:00 2001
From: Huang Ying <ying.huang@...el.com>
Date: Wed, 29 Sep 2021 10:57:19 +0800
Subject: [PATCH] mm/migrate: Debug CPU hotplug regression

Signed-off-by: "Huang, Ying" <ying.huang@...el.com>
---
 mm/migrate.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index c7da064b4781..c4805f15e616 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -3261,15 +3261,17 @@ static int __meminit migrate_on_reclaim_callback(struct notifier_block *self,
  * The ordering is also currently dependent on which nodes have
  * CPUs.  That means we need CPU on/offline notification too.
  */
-static int migration_online_cpu(unsigned int cpu)
+static int migration_cpu_hotplug(unsigned int cpu)
 {
-	set_migration_target_nodes();
-	return 0;
-}
+	static int nr_cpu_node_saved;
+	int nr_cpu_node;
+
+	nr_cpu_node = num_node_state(N_CPU);
+	if (nr_cpu_node != nr_cpu_node_saved) {
+		set_migration_target_nodes();
+		nr_cpu_node_saved = nr_cpu_node;
+	}
 
-static int migration_offline_cpu(unsigned int cpu)
-{
-	set_migration_target_nodes();
 	return 0;
 }
 
@@ -3283,7 +3285,7 @@ static int __init migrate_on_reclaim_init(void)
 	WARN_ON(!node_demotion);
 
 	ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline",
-					NULL, migration_offline_cpu);
+					NULL, migration_cpu_hotplug);
 	/*
 	 * In the unlikely case that this fails, the automatic
 	 * migration targets may become suboptimal for nodes
@@ -3292,7 +3294,7 @@ static int __init migrate_on_reclaim_init(void)
 	 */
 	WARN_ON(ret < 0);
 	ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online",
-				migration_online_cpu, NULL);
+				migration_cpu_hotplug, NULL);
 	WARN_ON(ret < 0);
 
 	hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
-- 
2.30.2