lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a156de2d-a5ad-4e11-8744-24dd07f810a2@arm.com>
Date: Wed, 21 Jan 2026 11:55:07 +0000
From: Christian Loehle <christian.loehle@....com>
To: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@...il.com>,
 rafael@...nel.org
Cc: ionut_n2001@...oo.com, daniel.lezcano@...aro.org,
 linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org
Subject: Re: [PATCH 1/1] cpuidle: menu: Add 25% safety margin to short
 predictions when tick is stopped

On 1/20/26 21:17, Ionut Nechita (Sunlight Linux) wrote:
> From: Ionut Nechita <ionut_n2001@...oo.com>
> 
> When the tick is already stopped and the predicted idle duration is short
> (< TICK_NSEC), the original code uses next_timer_ns directly. This can be
> too conservative on platforms with high C-state exit latencies.

The other side of the argument is of course that the predicted idle duration
is too short, mostly full of values that are no longer applicable.
Then we're potentially stuck in a too shallow state for a very long time.

> 
> On Intel server platforms (2022+), this causes excessive wakeup latencies
> (~150us) when the actual idle duration is much shorter than next_timer_ns,
> because the governor selects package C-states (PC6) when shallower states
> would be more appropriate.
> 
> Add a 25% safety margin to the prediction instead of using next_timer_ns
> directly, while still clamping to next_timer_ns to avoid selecting
> unnecessarily deep states.

Is this needed?
Why is
min(predicted_ns, data->next_timer_ns);
not enough?
What do the results look like with that?
Again, traces or sysfs dumps pre and post test would be helpful.

> 
> Testing shows this reduces qperf latency from 151us to ~30us on affected
> platforms while maintaining good power efficiency. Platforms with fast
> C-state transitions (Ice Lake: 12us, Skylake: 21us) see minimal impact.
> 
> Cc: stable@...r.kernel.org
> Signed-off-by: Ionut Nechita <ionut_n2001@...oo.com>
> ---
>  drivers/cpuidle/governors/menu.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 64d6f7a1c776..de1dd46fea7a 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -287,12 +287,20 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  	/*
>  	 * If the tick is already stopped, the cost of possible short idle
>  	 * duration misprediction is much higher, because the CPU may be stuck
> -	 * in a shallow idle state for a long time as a result of it.  In that
> -	 * case, say we might mispredict and use the known time till the closest
> -	 * timer event for the idle state selection.
> +	 * in a shallow idle state for a long time as a result of it.
> +	 *
> +	 * Add a 25% safety margin to the prediction to reduce the risk of
> +	 * selecting too shallow state, but clamp to next_timer to avoid
> +	 * selecting unnecessarily deep states.
> +	 *
> +	 * This helps on platforms with high C-state exit latencies (e.g.,
> +	 * Intel server platforms 2022+ with ~150us) where using next_timer
> +	 * directly causes excessive wakeup latency when the actual idle
> +	 * duration is much shorter.
>  	 */
>  	if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
> -		predicted_ns = data->next_timer_ns;
> +		predicted_ns = min(predicted_ns + (predicted_ns >> 2),
> +				   data->next_timer_ns);
>  
>  	/*
>  	 * Find the idle state with the lowest power while satisfying


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ