lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0da8086f-2b6c-46e3-92ca-e156b9374a2a@arm.com>
Date: Wed, 13 Aug 2025 20:13:26 +0100
From: Christian Loehle <christian.loehle@....com>
To: "Rafael J. Wysocki" <rafael@...nel.org>,
 Linux PM <linux-pm@...r.kernel.org>
Cc: Frederic Weisbecker <frederic@...nel.org>,
 LKML <linux-kernel@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v1 1/3] cpuidle: governors: menu: Avoid selecting states
 with too much latency

On 8/13/25 11:25, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> 
> Occasionally, the exit latency of the idle state selected by the menu
> governor may exceed the PM QoS CPU wakeup latency limit.  Namely, if the
> scheduler tick has been stopped already and predicted_ns is greater than
> the tick period length, the governor may return an idle state whose exit
> latency exceeds latency_req because that decision is made before
> checking the current idle state's exit latency.
> 
> For instance, say that there are 3 idle states, 0, 1, and 2.  For idle
> states 0 and 1, the exit latency is equal to the target residency and
> the values are 0 and 5 us, respectively.  State 2 is deeper and has the
> exit latency and target residency of 200 us and 2 ms (which is greater
> than the tick period length), respectively.
> 
> Say that predicted_ns is equal to TICK_NSEC and the PM QoS latency
> limit is 20 us.  After the first two iterations of the main loop in
> menu_select(), idx becomes 1 and in the third iteration of it the target
Can drop "of it" here?

> residency of the current state (state 2) is greater than predicted_ns.
> State 2 is not a polling one and predicted_ns is not less than TICK_NSEC,
> so the check on whether or not the tick has been stopped is done.  Say
> that the tick has been stopped already and there are no imminent timers
> (that is, delta_tick is greater than the target residency of state 2).
> In that case, idx becomes 2 and it is returned immediately, but the exit
> latency of state 2 exceeds the latency limit.
> 
> Address this issue by modifying the code to compare the exit latency of
> the current idle state (idle state i) with the latency limit before
> comparing its target residecy with predicted_ns, which allows one
residency

> more exit_latency_ns check that becomes redundant to be dropped.
> 
> However, after the above change, latency_req cannot take the predicted_ns
> value any more, which takes place after commit 38f83090f515 ("cpuidle:
> menu: Remove iowait influence"), because it may cause a polling state
> to be returned prematurely.
> 
> In the context of the previous example say that predicted_ns is 3000 and
> the PM QoS latency limit is still 20 us.  Additionally, say that idle
> state 0 is a polling one.  Moving the exit_latency_ns check before the
> target_residency_ns one causes the loop to terminate in the second
> iteration, before the target_residency_ns check, so idle state 0 will be
> returned even though previously state 1 would be returned if there were
> no imminent timers.
> 
> For this reason, remove the assignment of the predicted_ns value to
> latency_req from the code.
> 
> Fixes: 5ef499cd571c ("cpuidle: menu: Handle stopped tick more aggressively")
> Cc: 4.17+ <stable@...r.kernel.org> # 4.17+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
>  drivers/cpuidle/governors/menu.c |   29 ++++++++++++-----------------
>  1 file changed, 12 insertions(+), 17 deletions(-)
> 
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -287,20 +287,15 @@
>  		return 0;
>  	}
>  
> -	if (tick_nohz_tick_stopped()) {
> -		/*
> -		 * If the tick is already stopped, the cost of possible short
> -		 * idle duration misprediction is much higher, because the CPU
> -		 * may be stuck in a shallow idle state for a long time as a
> -		 * result of it.  In that case say we might mispredict and use
> -		 * the known time till the closest timer event for the idle
> -		 * state selection.
> -		 */
> -		if (predicted_ns < TICK_NSEC)
> -			predicted_ns = data->next_timer_ns;
> -	} else if (latency_req > predicted_ns) {
> -		latency_req = predicted_ns;
> -	}
> +	/*
> +	 * If the tick is already stopped, the cost of possible short idle
> +	 * duration misprediction is much higher, because the CPU may be stuck
> +	 * in a shallow idle state for a long time as a result of it.  In that
> +	 * case, say we might mispredict and use the known time till the closest
> +	 * timer event for the idle state selection.
> +	 */
> +	if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
> +		predicted_ns = data->next_timer_ns;
>  
>  	/*
>  	 * Find the idle state with the lowest power while satisfying
> @@ -316,13 +311,15 @@
>  		if (idx == -1)
>  			idx = i; /* first enabled state */
>  
> +		if (s->exit_latency_ns > latency_req)
> +			break;
> +
>  		if (s->target_residency_ns > predicted_ns) {
>  			/*
>  			 * Use a physical idle state, not busy polling, unless
>  			 * a timer is going to trigger soon enough.
>  			 */
>  			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> -			    s->exit_latency_ns <= latency_req &&
>  			    s->target_residency_ns <= data->next_timer_ns) {
>  				predicted_ns = s->target_residency_ns;
>  				idx = i;
> @@ -354,8 +351,6 @@
>  
>  			return idx;
>  		}
> -		if (s->exit_latency_ns > latency_req)
> -			break;
>  
>  		idx = i;
>  	}
> 
> 
> 

Good catch!
Reviewed-by: Christian Loehle <christian.loehle@....com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ