lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250908075443.208570-1-aboorvad@linux.ibm.com>
Date: Mon,  8 Sep 2025 13:24:43 +0530
From: Aboorva Devarajan <aboorvad@...ux.ibm.com>
To: rafael@...nel.org, christian.loehle@....com, daniel.lezcano@...aro.org
Cc: aboorvad@...ux.ibm.com, gautam@...ux.ibm.com, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: [PATCH v3 1/1] cpuidle: menu: Add residency threshold for non-polling state selection

On virtualized PowerPC (pseries) systems, where only one polling state
(Snooze) and one deep state (CEDE) are available, selecting CEDE when
the predicted idle duration exceeds the target residency of the CEDE
state can hurt performance. In such cases, the entry/exit overhead of
CEDE outweighs the power savings, leading to unnecessary state transitions
and higher latency.

Menu governor currently contains a special-case rule that prioritizes
the first non-polling state over polling, even when its target residency
is much longer than the predicted idle duration. On PowerPC/pseries,
where the gap between the polling state (Snooze) and the first non-polling
state (CEDE) is large, this behavior causes performance regressions.

This patch refines the special case by adding an extra requirement:
the first non-polling state may only be chosen if its
target_residency_ns is below the defined RESIDENCY_THRESHOLD_NS. If this
condition is not met, the non-polling state is not selected, and polling
state is retained instead.

This change is limited to the single special-case condition for the first
non-polling state. The general state selection logic in the menu governor
remains unchanged.

Performance improvement observed with pgbench on PowerPC (pseries)
system:
+---------------------------+------------+------------+------------+
| Metric                    | Baseline   | Patched    | Change (%) |
+---------------------------+------------+------------+------------+
| Transactions/sec (TPS)    | 495,210    | 536,982    | +8.45%     |
| Avg latency (ms)          | 0.163      | 0.150      | -7.98%     |
+---------------------------+------------+------------+------------+
CPUIdle state usage:
+--------------+--------------+-------------+
| Metric       | Baseline     | Patched     |
+--------------+--------------+-------------+
| Total usage  | 12,735,820   | 13,918,442  |
| Above usage  | 11,401,520   | 1,598,210   |
| Below usage  | 20,145       | 702,395     |
+--------------+--------------+-------------+

Above/Total and Below/Total usage percentages which indicates
mispredictions:
+------------------------+-----------+---------+
| Metric                 | Baseline  | Patched |
+------------------------+-----------+---------+
| Above % (Above/Total)  | 89.56%    | 11.49%  |
| Below % (Below/Total)  | 0.16%     | 5.05%   |
| Total cpuidle miss (%) | 89.72%    | 16.54%  |
+------------------------+-----------+---------+

The results show that restricting non-polling state selection to
cases where its residency is within the threshold reduces mispredictions,
lowers unnecessary state transitions, and improves overall throughput.

Signed-off-by: Aboorva Devarajan <aboorvad@...ux.ibm.com>
---

v2: https://lore.kernel.org/all/20250317060357.29451-1-aboorvad@linux.ibm.com/

Changes in v2 -> v3:
  - Modifed the patch following Rafael's feedback, incorporated a residency threshold check
    (s->target_residency_ns < RESIDENCY_THRESHOLD_NS) as suggested.
  - Updated commit message accordingly.
---
 drivers/cpuidle/governors/menu.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index b2e3d0b0a116..d25b04539109 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -316,11 +316,13 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 
 		if (s->target_residency_ns > predicted_ns) {
 			/*
-			 * Use a physical idle state, not busy polling, unless
-			 * a timer is going to trigger soon enough.
+			 * Use a physical idle state instead of busy polling
+			 * if the next timer doesn't expire soon and its
+			 * target residency is below the residency threshold.
 			 */
 			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
-			    s->target_residency_ns <= data->next_timer_ns) {
+			    s->target_residency_ns <= data->next_timer_ns &&
+			    s->target_residency_ns < RESIDENCY_THRESHOLD_NS) {
 				predicted_ns = s->target_residency_ns;
 				idx = i;
 				break;
-- 
2.50.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ