lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260120211725.124349-2-sunlightlinux@gmail.com>
Date: Tue, 20 Jan 2026 23:17:25 +0200
From: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@...il.com>
To: rafael@...nel.org
Cc: ionut_n2001@...oo.com,
	daniel.lezcano@...aro.org,
	christian.loehle@....com,
	linux-pm@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	stable@...r.kernel.org
Subject: [PATCH 1/1] cpuidle: menu: Add 25% safety margin to short predictions when tick is stopped

From: Ionut Nechita <ionut_n2001@...oo.com>

When the tick is already stopped and the predicted idle duration is short
(< TICK_NSEC), the original code uses next_timer_ns directly. This can be
too conservative on platforms with high C-state exit latencies.

On Intel server platforms (2022+), this causes excessive wakeup latencies
(~150us) when the actual idle duration is much shorter than next_timer_ns,
because the governor selects package C-states (PC6) when shallower states
would be more appropriate.

Add a 25% safety margin to the prediction instead of using next_timer_ns
directly, while still clamping to next_timer_ns to avoid selecting
unnecessarily deep states.

Testing shows this reduces qperf latency from 151us to ~30us on affected
platforms while maintaining good power efficiency. Platforms with fast
C-state transitions (Ice Lake: 12us, Skylake: 21us) see minimal impact.

Cc: stable@...r.kernel.org
Signed-off-by: Ionut Nechita <ionut_n2001@...oo.com>
---
 drivers/cpuidle/governors/menu.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 64d6f7a1c776..de1dd46fea7a 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -287,12 +287,20 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	/*
 	 * If the tick is already stopped, the cost of possible short idle
 	 * duration misprediction is much higher, because the CPU may be stuck
-	 * in a shallow idle state for a long time as a result of it.  In that
-	 * case, say we might mispredict and use the known time till the closest
-	 * timer event for the idle state selection.
+	 * in a shallow idle state for a long time as a result of it.
+	 *
+	 * Add a 25% safety margin to the prediction to reduce the risk of
+	 * selecting too shallow state, but clamp to next_timer to avoid
+	 * selecting unnecessarily deep states.
+	 *
+	 * This helps on platforms with high C-state exit latencies (e.g.,
+	 * Intel server platforms 2022+ with ~150us) where using next_timer
+	 * directly causes excessive wakeup latency when the actual idle
+	 * duration is much shorter.
 	 */
 	if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
-		predicted_ns = data->next_timer_ns;
+		predicted_ns = min(predicted_ns + (predicted_ns >> 2),
+				   data->next_timer_ns);
 
 	/*
 	 * Find the idle state with the lowest power while satisfying
-- 
2.52.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ