linux-kernel - Re: [tip:sched/core] sched/numa: Prefer NUMA hotness over cache hotness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 7 Jul 2015 05:49:31 +0530
From:	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
To:	torvalds@...ux-foundation.org, riel@...hat.com,
	linux-kernel@...r.kernel.org, tglx@...utronix.de, hpa@...or.com,
	mgorman@...e.de, peterz@...radead.org, efault@....de,
	mingo@...nel.org
Subject: Re: [tip:sched/core] sched/numa:  Prefer NUMA hotness over cache
 hotness

* tip-bot for Srikar Dronamraju <tipbot@...or.com> [2015-07-06 08:50:28]:

> Commit-ID:  8a9e62a238a3033158e0084d8df42ea116d69ce1
> Gitweb:     http://git.kernel.org/tip/8a9e62a238a3033158e0084d8df42ea116d69ce1
> Author:     Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
> AuthorDate: Tue, 16 Jun 2015 17:25:59 +0530
> Committer:  Ingo Molnar <mingo@...nel.org>
> CommitDate: Mon, 6 Jul 2015 15:29:55 +0200
>
> sched/numa: Prefer NUMA hotness over cache hotness

In the above commit, I missed a fact that sched feature NUMA was used to
enable/disable NUMA_BALANCING. The below version of the same patch takes
care of this fact. While I am posting the fixed version, it would need a
revert of the above commit. Please let me know if you just want the
differential patch that can apply on top of the above commit.

---------->8--------------------------------------------------------------8<---------------

>From 6dd3a7253c42665393f900fda4e6b4193e8114a3 Mon Sep 17 00:00:00 2001
From: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Date: Wed, 3 Jun 2015 02:57:26 +0530
Subject: [PATCH] sched/tip:Prefer numa hotness over cache hotness

The current load balancer may not try to prevent a task from moving out
of a preferred node to a less preferred node. The reason for this being:

- Since sched features NUMA and NUMA_RESIST_LOWER are disabled by
  default, migrate_degrades_locality() always returns false.

- Even if NUMA_RESIST_LOWER were to be enabled, if its cache hot,
  migrate_degrades_locality() never gets called.

The above behaviour can mean that tasks can move out of their preferred
node but they may be eventually be brought back to their preferred node
by numa balancer (due to higher numa faults).

To avoid the above, this commit merges migrate_degrades_locality() and
migrate_improves_locality(). It combines 2 sched features NUMA_FAVOUR_HIGHER
and NUMA_RESIST_LOWER into a single sched feature NUMA_FAVOUR_HIGHER.

Acked-by: Rik van Riel <riel@...hat.com>
Signed-off-by: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
---
Changes from previous version:
- Rebased to tip.

Added Ack from Rik based on the
http://lkml.kernel.org/r/557845D5.6060800@redhat.com.
Please let me know otherwise.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7210ae8..8a8ce95 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5662,72 +5662,40 @@ static int task_hot(struct task_struct *p, struct lb_env *env)
 
 #ifdef CONFIG_NUMA_BALANCING
 /*
- * Returns true if the destination node is the preferred node.
- * Needs to match fbq_classify_rq(): if there is a runnable task
- * that is not on its preferred node, we should identify it.
+ * Returns 1, if task migration degrades locality
+ * Returns 0, if task migration improves locality i.e migration preferred.
+ * Returns -1, if task migration is not affected by locality.
  */
-static bool migrate_improves_locality(struct task_struct *p, struct lb_env *env)
-{
-	struct numa_group *numa_group = rcu_dereference(p->numa_group);
-	unsigned long src_faults, dst_faults;
-	int src_nid, dst_nid;
-
-	if (!sched_feat(NUMA_FAVOUR_HIGHER) || !p->numa_faults ||
-	    !(env->sd->flags & SD_NUMA)) {
-		return false;
-	}
-
-	src_nid = cpu_to_node(env->src_cpu);
-	dst_nid = cpu_to_node(env->dst_cpu);
-
-	if (src_nid == dst_nid)
-		return false;
-
-	/* Encourage migration to the preferred node. */
-	if (dst_nid == p->numa_preferred_nid)
-		return true;
-
-	/* Migrating away from the preferred node is bad. */
-	if (src_nid == p->numa_preferred_nid)
-		return false;
-
-	if (numa_group) {
-		src_faults = group_faults(p, src_nid);
-		dst_faults = group_faults(p, dst_nid);
-	} else {
-		src_faults = task_faults(p, src_nid);
-		dst_faults = task_faults(p, dst_nid);
-	}
-
-	return dst_faults > src_faults;
-}
-
 
-static bool migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
+static int migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
 {
 	struct numa_group *numa_group = rcu_dereference(p->numa_group);
 	unsigned long src_faults, dst_faults;
 	int src_nid, dst_nid;
 
-	if (!sched_feat(NUMA) || !sched_feat(NUMA_RESIST_LOWER))
-		return false;
+	if (!sched_feat(NUMA) || !sched_feat(NUMA_FAVOUR_HIGHER))
+		return -1;
 
 	if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
-		return false;
+		return -1;
 
 	src_nid = cpu_to_node(env->src_cpu);
 	dst_nid = cpu_to_node(env->dst_cpu);
 
 	if (src_nid == dst_nid)
-		return false;
+		return -1;
 
-	/* Migrating away from the preferred node is bad. */
-	if (src_nid == p->numa_preferred_nid)
-		return true;
+	/* Migrating away from the preferred node is always bad. */
+	if (src_nid == p->numa_preferred_nid) {
+		if (env->src_rq->nr_running > env->src_rq->nr_preferred_running)
+			return 1;
+		else
+			return -1;
+	}
 
 	/* Encourage migration to the preferred node. */
 	if (dst_nid == p->numa_preferred_nid)
-		return false;
+		return 0;
 
 	if (numa_group) {
 		src_faults = group_faults(p, src_nid);
@@ -5741,16 +5709,10 @@ static bool migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
 }
 
 #else
-static inline bool migrate_improves_locality(struct task_struct *p,
+static inline int migrate_degrades_locality(struct task_struct *p,
 					     struct lb_env *env)
 {
-	return false;
-}
-
-static inline bool migrate_degrades_locality(struct task_struct *p,
-					     struct lb_env *env)
-{
-	return false;
+	return -1;
 }
 #endif
 
@@ -5760,7 +5722,7 @@ static inline bool migrate_degrades_locality(struct task_struct *p,
 static
 int can_migrate_task(struct task_struct *p, struct lb_env *env)
 {
-	int tsk_cache_hot = 0;
+	int tsk_cache_hot;
 
 	lockdep_assert_held(&env->src_rq->lock);
 
@@ -5818,13 +5780,13 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	 * 2) task is cache cold, or
 	 * 3) too many balance attempts have failed.
 	 */
-	tsk_cache_hot = task_hot(p, env);
-	if (!tsk_cache_hot)
-		tsk_cache_hot = migrate_degrades_locality(p, env);
+	tsk_cache_hot = migrate_degrades_locality(p, env);
+	if (tsk_cache_hot == -1)
+		tsk_cache_hot = task_hot(p, env);
 
-	if (migrate_improves_locality(p, env) || !tsk_cache_hot ||
+	if (tsk_cache_hot <= 0 ||
 	    env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
-		if (tsk_cache_hot) {
+		if (tsk_cache_hot == 1) {
 			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
 			schedstat_inc(p, se.statistics.nr_forced_migrations);
 		}
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 91e33cd..d4d4726 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -84,15 +84,8 @@ SCHED_FEAT(NUMA,	false)
 /*
  * NUMA_FAVOUR_HIGHER will favor moving tasks towards nodes where a
  * higher number of hinting faults are recorded during active load
- * balancing.
+ * balancing. It will resist moving tasks towards nodes where a lower
+ * number of hinting faults have been recorded.
  */
 SCHED_FEAT(NUMA_FAVOUR_HIGHER, true)
-
-/*
- * NUMA_RESIST_LOWER will resist moving tasks towards nodes where a
- * lower number of hinting faults have been recorded. As this has
- * the potential to prevent a task ever migrating to a new node
- * due to CPU overload it is disabled by default.
- */
-SCHED_FEAT(NUMA_RESIST_LOWER, false)
 #endif

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/