[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5043159.31r3eYUQgx@rafael.j.wysocki>
Date: Wed, 13 Aug 2025 12:25:58 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Linux PM <linux-pm@...r.kernel.org>
Cc: Frederic Weisbecker <frederic@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>,
Christian Loehle <christian.loehle@....com>
Subject:
[PATCH v1 1/3] cpuidle: governors: menu: Avoid selecting states with too much
latency
From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
Occasionally, the exit latency of the idle state selected by the menu
governor may exceed the PM QoS CPU wakeup latency limit. Namely, if the
scheduler tick has been stopped already and predicted_ns is greater than
the tick period length, the governor may return an idle state whose exit
latency exceeds latency_req because that decision is made before
checking the current idle state's exit latency.
For instance, say that there are 3 idle states, 0, 1, and 2. For idle
states 0 and 1, the exit latency is equal to the target residency and
the values are 0 and 5 us, respectively. State 2 is deeper and has the
exit latency and target residency of 200 us and 2 ms (which is greater
than the tick period length), respectively.
Say that predicted_ns is equal to TICK_NSEC and the PM QoS latency
limit is 20 us. After the first two iterations of the main loop in
menu_select(), idx becomes 1 and in the third iteration of it the target
residency of the current state (state 2) is greater than predicted_ns.
State 2 is not a polling one and predicted_ns is not less than TICK_NSEC,
so the check on whether or not the tick has been stopped is done. Say
that the tick has been stopped already and there are no imminent timers
(that is, delta_tick is greater than the target residency of state 2).
In that case, idx becomes 2 and it is returned immediately, but the exit
latency of state 2 exceeds the latency limit.
Address this issue by modifying the code to compare the exit latency of
the current idle state (idle state i) with the latency limit before
comparing its target residecy with predicted_ns, which allows one
more exit_latency_ns check that becomes redundant to be dropped.
However, after the above change, latency_req cannot take the predicted_ns
value any more, which takes place after commit 38f83090f515 ("cpuidle:
menu: Remove iowait influence"), because it may cause a polling state
to be returned prematurely.
In the context of the previous example say that predicted_ns is 3000 and
the PM QoS latency limit is still 20 us. Additionally, say that idle
state 0 is a polling one. Moving the exit_latency_ns check before the
target_residency_ns one causes the loop to terminate in the second
iteration, before the target_residency_ns check, so idle state 0 will be
returned even though previously state 1 would be returned if there were
no imminent timers.
For this reason, remove the assignment of the predicted_ns value to
latency_req from the code.
Fixes: 5ef499cd571c ("cpuidle: menu: Handle stopped tick more aggressively")
Cc: 4.17+ <stable@...r.kernel.org> # 4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
---
drivers/cpuidle/governors/menu.c | 29 ++++++++++++-----------------
1 file changed, 12 insertions(+), 17 deletions(-)
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -287,20 +287,15 @@
return 0;
}
- if (tick_nohz_tick_stopped()) {
- /*
- * If the tick is already stopped, the cost of possible short
- * idle duration misprediction is much higher, because the CPU
- * may be stuck in a shallow idle state for a long time as a
- * result of it. In that case say we might mispredict and use
- * the known time till the closest timer event for the idle
- * state selection.
- */
- if (predicted_ns < TICK_NSEC)
- predicted_ns = data->next_timer_ns;
- } else if (latency_req > predicted_ns) {
- latency_req = predicted_ns;
- }
+ /*
+ * If the tick is already stopped, the cost of possible short idle
+ * duration misprediction is much higher, because the CPU may be stuck
+ * in a shallow idle state for a long time as a result of it. In that
+ * case, say we might mispredict and use the known time till the closest
+ * timer event for the idle state selection.
+ */
+ if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
+ predicted_ns = data->next_timer_ns;
/*
* Find the idle state with the lowest power while satisfying
@@ -316,13 +311,15 @@
if (idx == -1)
idx = i; /* first enabled state */
+ if (s->exit_latency_ns > latency_req)
+ break;
+
if (s->target_residency_ns > predicted_ns) {
/*
* Use a physical idle state, not busy polling, unless
* a timer is going to trigger soon enough.
*/
if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
- s->exit_latency_ns <= latency_req &&
s->target_residency_ns <= data->next_timer_ns) {
predicted_ns = s->target_residency_ns;
idx = i;
@@ -354,8 +351,6 @@
return idx;
}
- if (s->exit_latency_ns > latency_req)
- break;
idx = i;
}
Powered by blists - more mailing lists