linux-kernel - [PATCH] mm: migrate: set demotion targets differently

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220329115222.8923-1-jvgediya@linux.ibm.com>
Date:   Tue, 29 Mar 2022 17:22:22 +0530
From:   Jagdish Gediya <jvgediya@...ux.ibm.com>
To:     linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc:     akpm@...ux-foundation.org, aneesh.kumar@...ux.ibm.com,
        baolin.wang@...ux.alibaba.com, dave.hansen@...ux.intel.com,
        ying.huang@...el.com, Jagdish Gediya <jvgediya@...ux.ibm.com>
Subject: [PATCH] mm: migrate: set demotion targets differently

The current implementation to identify the demotion
targets limits some of the opportunities to share
the demotion targets between multiple source nodes.

Implement a logic to identify the loop in the demotion
targets such that all the possibilities of demotion can
be utilized. Don't share the used targets between all
the nodes, instead create the used targets from scratch
for each individual node based on for what all node this
node is a demotion target. This helps to share the demotion
targets without missing any possible way of demotion.

e.g. with below NUMA topology, where node 0 & 1 are
cpu + dram nodes, node 2 & 3 are equally slower memory
only nodes, and node 4 is slowest memory only node,

available: 5 nodes (0-4)
node 0 cpus: 0 1
node 0 size: n MB
node 0 free: n MB
node 1 cpus: 2 3
node 1 size: n MB
node 1 free: n MB
node 2 cpus:
node 2 size: n MB
node 2 free: n MB
node 3 cpus:
node 3 size: n MB
node 3 free: n MB
node 4 cpus:
node 4 size: n MB
node 4 free: n MB
node distances:
node   0   1   2   3   4
  0:  10  20  40  40  80
  1:  20  10  40  40  80
  2:  40  40  10  40  80
  3:  40  40  40  10  80
  4:  80  80  80  80  10

The existing implementation gives below demotion targets,

node    demotion_target
 0              3, 2
 1              4
 2              X
 3              X
 4		X

With this patch applied, below are the demotion targets,

node    demotion_target
 0              3, 2
 1              3, 2
 2              3
 3              4
 4		X

e.g. with below NUMA topology, where node 0, 1 & 2 are
cpu + dram nodes and node 3 is slow memory node,

available: 4 nodes (0-3)
node 0 cpus: 0 1
node 0 size: n MB
node 0 free: n MB
node 1 cpus: 2 3
node 1 size: n MB
node 1 free: n MB
node 2 cpus: 4 5
node 2 size: n MB
node 2 free: n MB
node 3 cpus:
node 3 size: n MB
node 3 free: n MB
node distances:
node   0   1   2   3
  0:  10  20  20  40
  1:  20  10  20  40
  2:  20  20  10  40
  3:  40  40  40  10

The existing implementation gives below demotion targets,

node    demotion_target
 0              3
 1              X
 2              X
 3              X

With this patch applied, below are the demotion targets,

node    demotion_target
 0              3
 1              3
 2              3
 3              X

with below NUMA topology, where node 0 & 2 are cpu + dram
nodes and node 1 & 3 are slow memory nodes,

available: 4 nodes (0-3)
node 0 cpus: 0 1
node 0 size: n MB
node 0 free: n MB
node 1 cpus:
node 1 size: n MB
node 1 free: n MB
node 2 cpus: 2 3
node 2 size: n MB
node 2 free: n MB
node 3 cpus:
node 3 size: n MB
node 3 free: n MB
node distances:
node   0   1   2   3
  0:  10  40  20  80
  1:  40  10  80  80
  2:  20  80  10  40
  3:  80  80  40  10

The existing implementation gives below demotion targets,

node    demotion_target
 0              3
 1              X
 2              3
 3              X

With this patch applied, below are the demotion targets,

node    demotion_target
 0              1
 1              3
 2              3
 3              X

As it can be seen above, node 3 can be demotion target for node
1 but existing implementation doesn't configure it that way. It
is better to move pages from node 1 to node 3 instead of moving
it from node 1 to swap.

Signed-off-by: Jagdish Gediya <jvgediya@...ux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@...ux.ibm.com>
---
 mm/migrate.c | 75 ++++++++++++++++++++++++++++------------------------
 1 file changed, 41 insertions(+), 34 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 3d60823afd2d..7ec8d934e706 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2381,10 +2381,13 @@ static int establish_migrate_target(int node, nodemask_t *used,
  */
 static void __set_migration_target_nodes(void)
 {
-	nodemask_t next_pass	= NODE_MASK_NONE;
-	nodemask_t this_pass	= NODE_MASK_NONE;
 	nodemask_t used_targets = NODE_MASK_NONE;
 	int node, best_distance;
+	nodemask_t *src_nodes;
+
+	src_nodes = kcalloc(nr_node_ids, sizeof(nodemask_t), GFP_KERNEL);
+	if (!src_nodes)
+		return;
 
 	/*
 	 * Avoid any oddities like cycles that could occur
@@ -2393,29 +2396,39 @@ static void __set_migration_target_nodes(void)
 	 */
 	disable_all_migrate_targets();
 
-	/*
-	 * Allocations go close to CPUs, first.  Assume that
-	 * the migration path starts at the nodes with CPUs.
-	 */
-	next_pass = node_states[N_CPU];
-again:
-	this_pass = next_pass;
-	next_pass = NODE_MASK_NONE;
-	/*
-	 * To avoid cycles in the migration "graph", ensure
-	 * that migration sources are not future targets by
-	 * setting them in 'used_targets'.  Do this only
-	 * once per pass so that multiple source nodes can
-	 * share a target node.
-	 *
-	 * 'used_targets' will become unavailable in future
-	 * passes.  This limits some opportunities for
-	 * multiple source nodes to share a destination.
-	 */
-	nodes_or(used_targets, used_targets, this_pass);
+	for_each_online_node(node) {
+		int tmp_node;
 
-	for_each_node_mask(node, this_pass) {
 		best_distance = -1;
+		used_targets = NODE_MASK_NONE;
+
+		/*
+		 * Avoid adding same node as the demotion target.
+		 */
+		node_set(node, used_targets);
+
+		/*
+		 * Add CPU NUMA nodes to the used target list so that it
+		 * won't be considered a demotion target.
+		 */
+		nodes_or(used_targets, used_targets, node_states[N_CPU]);
+
+		/*
+		 * Add all nodes that has appeared as source node of demotion
+		 * for this target node.
+		 *
+		 * To avoid cycles in the migration "graph", ensure
+		 * that migration sources are not future targets by
+		 * setting them in 'used_targets'.
+		 */
+		for_each_node_mask(tmp_node, src_nodes[node])
+			nodes_or(used_targets, used_targets, src_nodes[tmp_node]);
+
+		/*
+		 * Now update the demotion src nodes with other nodes in graph
+		 * which got computed above.
+		 */
+		nodes_or(src_nodes[node], src_nodes[node], used_targets);
 
 		/*
 		 * Try to set up the migration path for the node, and the target
@@ -2434,20 +2447,14 @@ static void __set_migration_target_nodes(void)
 				best_distance = node_distance(node, target_node);
 
 			/*
-			 * Visit targets from this pass in the next pass.
-			 * Eventually, every node will have been part of
-			 * a pass, and will become set in 'used_targets'.
+			 * Add this node in the src_nodes list so that we can
+			 * detect the looping.
 			 */
-			node_set(target_node, next_pass);
+			node_set(node, src_nodes[target_node]);
 		} while (1);
 	}
-	/*
-	 * 'next_pass' contains nodes which became migration
-	 * targets in this pass.  Make additional passes until
-	 * no more migrations targets are available.
-	 */
-	if (!nodes_empty(next_pass))
-		goto again;
+
+	kfree(src_nodes);
 }
 
 /*
-- 
2.35.1