[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260120211725.124349-2-sunlightlinux@gmail.com>
Date: Tue, 20 Jan 2026 23:17:25 +0200
From: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@...il.com>
To: rafael@...nel.org
Cc: ionut_n2001@...oo.com,
daniel.lezcano@...aro.org,
christian.loehle@....com,
linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: [PATCH 1/1] cpuidle: menu: Add 25% safety margin to short predictions when tick is stopped
From: Ionut Nechita <ionut_n2001@...oo.com>
When the tick is already stopped and the predicted idle duration is short
(< TICK_NSEC), the original code uses next_timer_ns directly. This can be
too conservative on platforms with high C-state exit latencies.
On Intel server platforms (2022+), this causes excessive wakeup latencies
(~150us) when the actual idle duration is much shorter than next_timer_ns,
because the governor selects package C-states (PC6) when shallower states
would be more appropriate.
Add a 25% safety margin to the prediction instead of using next_timer_ns
directly, while still clamping to next_timer_ns to avoid selecting
unnecessarily deep states.
Testing shows this reduces qperf latency from 151us to ~30us on affected
platforms while maintaining good power efficiency. Platforms with fast
C-state transitions (Ice Lake: 12us, Skylake: 21us) see minimal impact.
Cc: stable@...r.kernel.org
Signed-off-by: Ionut Nechita <ionut_n2001@...oo.com>
---
drivers/cpuidle/governors/menu.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 64d6f7a1c776..de1dd46fea7a 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -287,12 +287,20 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
/*
* If the tick is already stopped, the cost of possible short idle
* duration misprediction is much higher, because the CPU may be stuck
- * in a shallow idle state for a long time as a result of it. In that
- * case, say we might mispredict and use the known time till the closest
- * timer event for the idle state selection.
+ * in a shallow idle state for a long time as a result of it.
+ *
+ * Add a 25% safety margin to the prediction to reduce the risk of
+ * selecting too shallow state, but clamp to next_timer to avoid
+ * selecting unnecessarily deep states.
+ *
+ * This helps on platforms with high C-state exit latencies (e.g.,
+ * Intel server platforms 2022+ with ~150us) where using next_timer
+ * directly causes excessive wakeup latency when the actual idle
+ * duration is much shorter.
*/
if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
- predicted_ns = data->next_timer_ns;
+ predicted_ns = min(predicted_ns + (predicted_ns >> 2),
+ data->next_timer_ns);
/*
* Find the idle state with the lowest power while satisfying
--
2.52.0
Powered by blists - more mailing lists