[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a156de2d-a5ad-4e11-8744-24dd07f810a2@arm.com>
Date: Wed, 21 Jan 2026 11:55:07 +0000
From: Christian Loehle <christian.loehle@....com>
To: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@...il.com>,
rafael@...nel.org
Cc: ionut_n2001@...oo.com, daniel.lezcano@...aro.org,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH 1/1] cpuidle: menu: Add 25% safety margin to short
predictions when tick is stopped
On 1/20/26 21:17, Ionut Nechita (Sunlight Linux) wrote:
> From: Ionut Nechita <ionut_n2001@...oo.com>
>
> When the tick is already stopped and the predicted idle duration is short
> (< TICK_NSEC), the original code uses next_timer_ns directly. This can be
> too conservative on platforms with high C-state exit latencies.
The other side of the argument is of course that the predicted idle duration
is too short, mostly full of values that are no longer applicable.
Then we're potentially stuck in a too shallow state for a very long time.
>
> On Intel server platforms (2022+), this causes excessive wakeup latencies
> (~150us) when the actual idle duration is much shorter than next_timer_ns,
> because the governor selects package C-states (PC6) when shallower states
> would be more appropriate.
>
> Add a 25% safety margin to the prediction instead of using next_timer_ns
> directly, while still clamping to next_timer_ns to avoid selecting
> unnecessarily deep states.
Is this needed?
Why is
min(predicted_ns, data->next_timer_ns);
not enough?
What do the results look like with that?
Again, traces or sysfs dumps pre and post test would be helpful.
>
> Testing shows this reduces qperf latency from 151us to ~30us on affected
> platforms while maintaining good power efficiency. Platforms with fast
> C-state transitions (Ice Lake: 12us, Skylake: 21us) see minimal impact.
>
> Cc: stable@...r.kernel.org
> Signed-off-by: Ionut Nechita <ionut_n2001@...oo.com>
> ---
> drivers/cpuidle/governors/menu.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 64d6f7a1c776..de1dd46fea7a 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -287,12 +287,20 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> /*
> * If the tick is already stopped, the cost of possible short idle
> * duration misprediction is much higher, because the CPU may be stuck
> - * in a shallow idle state for a long time as a result of it. In that
> - * case, say we might mispredict and use the known time till the closest
> - * timer event for the idle state selection.
> + * in a shallow idle state for a long time as a result of it.
> + *
> + * Add a 25% safety margin to the prediction to reduce the risk of
> + * selecting too shallow state, but clamp to next_timer to avoid
> + * selecting unnecessarily deep states.
> + *
> + * This helps on platforms with high C-state exit latencies (e.g.,
> + * Intel server platforms 2022+ with ~150us) where using next_timer
> + * directly causes excessive wakeup latency when the actual idle
> + * duration is much shorter.
> */
> if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
> - predicted_ns = data->next_timer_ns;
> + predicted_ns = min(predicted_ns + (predicted_ns >> 2),
> + data->next_timer_ns);
>
> /*
> * Find the idle state with the lowest power while satisfying
Powered by blists - more mailing lists