lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <12630185.O9o76ZdvQC@rjwysocki.net>
Date: Tue, 04 Feb 2025 21:58:18 +0100
From: "Rafael J. Wysocki" <rjw@...ysocki.net>
To: Linux PM <linux-pm@...r.kernel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
 Daniel Lezcano <daniel.lezcano@...aro.org>,
 Christian Loehle <christian.loehle@....com>,
 Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>,
 Aboorva Devarajan <aboorvad@...ux.ibm.com>
Subject:
 [RFT][PATCH v1] cpuidle: teo: Avoid selecting deepest idle state over-eagerly

From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>

It has been observed that the recent teo governor update which concluded
with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
for low latency constraints") caused the max-jOPS score of the SPECjbb
2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
While it may be argued that this is not a significant increase, the
previous score can be restored by tweaking the inequality used by teo
to decide whether or not to preselect the deepest enabled idle state.
That change also causes the critical-jOPS score of SPECjbb to increase
by around 2%.

Namely, the likelihood of selecting the deepest enabled idle state in
teo on the platform in question has increased after commit 13ed5c4a6d9c
("cpuidle: teo: Skip getting the sleep length if wakeups are very
frequent") because some timer wakeups were previously counted as non-
timer ones and they were effectively added to the left-hand side of the
inequality deciding whether or not to preselect the deepest idle state.

Many of them are now (accurately) counted as timer wakeups, so the left-
hand side of that inequality is now effectively smaller in some cases,
especially when timer wakeups often occur in the range below the target
residency of the deepest enabled idle state and idle states with target
residencies below CPUIDLE_FLAG_POLLING are often selected, but the
majority of recent idle intervals are still above that value most of
the time.  As a result, the deepest enabled idle state may be selected
more often than it used to be selected in some cases.

To counter that effect, add the sum of the hits metric for all of the
idle states below the candidate one (which is the deepest enabled idle
state at that point) to the left-hand side of the inequality mentioned
above.  This will cause it to be more balanced because, in principle,
putting both timer and non-timer wakeups on both sides of it is more
consistent than only taking into account the timer wakeups in the range
above the target residency of the deepest enabled idle state.

Link: https://www.spec.org/jbb2015/
Tested-by: Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
---
 drivers/cpuidle/governors/teo.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -349,13 +349,13 @@
 	}
 
 	/*
-	 * If the sum of the intercepts metric for all of the idle states
-	 * shallower than the current candidate one (idx) is greater than the
+	 * If the sum of the intercepts and hits metric for all of the idle
+	 * states below the current candidate one (idx) is greater than the
 	 * sum of the intercepts and hits metrics for the candidate state and
 	 * all of the deeper states, a shallower idle state is likely to be a
 	 * better choice.
 	 */
-	if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
+	if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
 		int first_suitable_idx = idx;
 
 		/*




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ