linux-kernel - [PATCH V2] sched: fair: Use the earliest break even

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200311202625.13629-1-daniel.lezcano@linaro.org>
Date:   Wed, 11 Mar 2020 21:26:25 +0100
From:   Daniel Lezcano <daniel.lezcano@...aro.org>
To:     peterz@...radead.org, mingo@...hat.com
Cc:     juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        linux-kernel@...r.kernel.org, qais.yousef@....com,
        valentin.schneider@....com
Subject: [PATCH V2] sched: fair: Use the earliest break even

In the idle CPU selection process occuring in the slow path via the
find_idlest_group_cpu() function, we pick up in priority an idle CPU
with the shallowest idle state otherwise we fall back to the least
loaded CPU.

In order to be more energy efficient but without impacting the
performances, let's use another criteria: the break even deadline.

At idle time, when we store the idle state the CPU is entering in, we
compute the next deadline where the CPU could be woken up without
spending more energy to sleep.

At the selection process, we use the shallowest CPU but in addition we
choose the one with the minimal break even deadline instead of relying
on the idle_timestamp. When the CPU is idle, the timestamp has less
meaning because the CPU could have wake up and sleep again several times
without exiting the idle loop. In this case the break even deadline is
more relevant as it increases the probability of choosing a CPU which
reached its break even.

Tested on:
 - a synquacer 24 cores, 6 sched domains
 - a hikey960 HMP 8 cores, 2 sched domains, with the EAS and energy probe

sched/perf and messaging does not show a performance regression. Ran
50 times schbench, adrestia and forkbench.

The tools described at https://lwn.net/Articles/724935/

 --------------------------------------------------------------
| Synquacer             | With break even | Without break even |
 --------------------------------------------------------------
| schbench *99.0th	|      14844.8    |         15017.6    |
| adrestia / periodic	|        57.95    |              57    |
| adrestia / single	|         49.3    |            55.4    |
 --------------------------------------------------------------
| Hikey960              | With break even | Without break even |
 --------------------------------------------------------------
| schbench *99.0th	|      56140.8    |           56256    |
| schbench energy	|      153.575    |         152.676    |
| adrestia / periodic	|         4.98    |             5.2    |
| adrestia / single	|         9.02    |            9.12    |
| adrestia energy	|         1.18    |           1.233    |
| forkbench             |        7.971    |            8.05    |
| forkbench energy      |         9.37    |            9.42    |
 --------------------------------------------------------------

Signed-off-by: Daniel Lezcano <daniel.lezcano@...aro.org>
---
 kernel/sched/fair.c  | 18 ++++++++++++++++--
 kernel/sched/idle.c  |  8 +++++++-
 kernel/sched/sched.h | 20 ++++++++++++++++++++
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4b5d5e5e701e..8bd6ea148db7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5793,6 +5793,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 {
 	unsigned long load, min_load = ULONG_MAX;
 	unsigned int min_exit_latency = UINT_MAX;
+	s64 min_break_even = S64_MAX;
 	u64 latest_idle_timestamp = 0;
 	int least_loaded_cpu = this_cpu;
 	int shallowest_idle_cpu = -1;
@@ -5810,6 +5811,8 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 		if (available_idle_cpu(i)) {
 			struct rq *rq = cpu_rq(i);
 			struct cpuidle_state *idle = idle_get_state(rq);
+			s64 break_even = idle_get_break_even(rq);
+
 			if (idle && idle->exit_latency < min_exit_latency) {
 				/*
 				 * We give priority to a CPU whose idle state
@@ -5817,10 +5820,21 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 				 * of any idle timestamp.
 				 */
 				min_exit_latency = idle->exit_latency;
+				min_break_even = break_even;
 				latest_idle_timestamp = rq->idle_stamp;
 				shallowest_idle_cpu = i;
-			} else if ((!idle || idle->exit_latency == min_exit_latency) &&
-				   rq->idle_stamp > latest_idle_timestamp) {
+			} else if ((idle && idle->exit_latency == min_exit_latency) &&
+				   break_even < min_break_even) {
+				/*
+				 * We give priority to the shallowest
+				 * idle states with the minimal break
+				 * even deadline to decrease the
+				 * probability to choose a CPU which
+				 * did not reach its break even yet
+				 */
+				min_break_even = break_even;
+				shallowest_idle_cpu = i;
+			} else if (!idle && rq->idle_stamp > latest_idle_timestamp) {
 				/*
 				 * If equal or no active idle state, then
 				 * the most recently idled CPU might have
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index b743bf38f08f..3342e7bae072 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -19,7 +19,13 @@ extern char __cpuidle_text_start[], __cpuidle_text_end[];
  */
 void sched_idle_set_state(struct cpuidle_state *idle_state)
 {
-	idle_set_state(this_rq(), idle_state);
+	struct rq *rq = this_rq();
+
+	idle_set_state(rq, idle_state);
+
+	if (idle_state)
+		idle_set_break_even(rq, ktime_get_ns() +
+				    idle_state->exit_latency_ns);
 }
 
 static int __read_mostly cpu_idle_force_poll;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2a0caf394dd4..eef1e535e2c2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1015,6 +1015,7 @@ struct rq {
 #ifdef CONFIG_CPU_IDLE
 	/* Must be inspected within a rcu lock section */
 	struct cpuidle_state	*idle_state;
+	s64			break_even;
 #endif
 };
 
@@ -1850,6 +1851,16 @@ static inline struct cpuidle_state *idle_get_state(struct rq *rq)
 
 	return rq->idle_state;
 }
+
+static inline void idle_set_break_even(struct rq *rq, s64 break_even)
+{
+	WRITE_ONCE(rq->break_even, break_even);
+}
+
+static inline s64 idle_get_break_even(struct rq *rq)
+{
+	return READ_ONCE(rq->break_even);
+}
 #else
 static inline void idle_set_state(struct rq *rq,
 				  struct cpuidle_state *idle_state)
@@ -1860,6 +1871,15 @@ static inline struct cpuidle_state *idle_get_state(struct rq *rq)
 {
 	return NULL;
 }
+
+static inline void idle_set_break_even(struct rq *rq, s64 break_even)
+{
+}
+
+static inline s64 idle_get_break_even(struct rq *rq)
+{
+	return 0;
+}
 #endif
 
 extern void schedule_idle(void);
-- 
2.17.1