[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0hCvvMzPCy+8N34vCt_d4mdeuMVHR9dQbs4cHwQDOpYrw@mail.gmail.com>
Date: Tue, 21 Oct 2025 21:25:22 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Aboorva Devarajan <aboorvad@...ux.ibm.com>
Cc: rafael@...nel.org, christian.loehle@....com, daniel.lezcano@...aro.org, 
	gautam@...ux.ibm.com, linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] cpuidle: menu: Use residency threshold in polling
 state override decisions
On Mon, Oct 6, 2025 at 3:40 AM Aboorva Devarajan <aboorvad@...ux.ibm.com> wrote:
>
> On virtualized PowerPC (pseries) systems, where only one polling state
> (Snooze) and one deep state (CEDE) are available, selecting CEDE when
> the predicted idle duration is less than the target residency of CEDE
> state can hurt performance. In such cases, the entry/exit overhead of
> CEDE outweighs the power savings, leading to unnecessary state
> transitions and higher latency.
>
> Menu governor currently contains a special-case rule that prioritizes
> the first non-polling state over polling, even when its target residency
> is much longer than the predicted idle duration. On PowerPC/pseries,
> where the gap between the polling state (Snooze) and the first non-polling
> state (CEDE) is large, this behavior causes performance regressions.
>
> This patch refines the special case by adding an extra requirement:
> the first non-polling state can only be chosen if its
> target residency is below the defined RESIDENCY_THRESHOLD_NS. If
> this condition is not satisfied, polling is allowed instead, avoiding
> suboptimal non-polling state entries.
>
> This change is limited to the single special-case rule for the first
> non-polling state. The general non-polling state selection logic in the
> menu governor remains unchanged.
>
> Performance improvement observed with pgbench on PowerPC (pseries)
> system:
> +---------------------------+------------+------------+------------+
> | Metric                    | Baseline   | Patched    | Change (%) |
> +---------------------------+------------+------------+------------+
> | Transactions/sec (TPS)    | 495,210    | 536,982    | +8.45%     |
> | Avg latency (ms)          | 0.163      | 0.150      | -7.98%     |
> +---------------------------+------------+------------+------------+
> CPUIdle state usage:
> +--------------+--------------+-------------+
> | Metric       | Baseline     | Patched     |
> +--------------+--------------+-------------+
> | Total usage  | 12,735,820   | 13,918,442  |
> | Above usage  | 11,401,520   | 1,598,210   |
> | Below usage  | 20,145       | 702,395     |
> +--------------+--------------+-------------+
>
> Above/Total and Below/Total usage percentages:
> +------------------------+-----------+---------+
> | Metric                 | Baseline  | Patched |
> +------------------------+-----------+---------+
> | Above % (Above/Total)  | 89.56%    | 11.49%  |
> | Below % (Below/Total)  | 0.16%     | 5.05%   |
> | Total cpuidle miss (%) | 89.72%    | 16.54%  |
> +------------------------+-----------+---------+
>
> The results indicate that restricting CEDE selection to cases where
> its residency matches the predicted idle time reduces mispredictions,
> lowers unnecessary state transitions, and improves overall throughput.
>
> Reviewed-by: Christian Loehle <christian.loehle@....com>
> Signed-off-by: Aboorva Devarajan <aboorvad@...ux.ibm.com>
> ---
>
> v3: https://lore.kernel.org/all/20250908075443.208570-1-aboorvad@linux.ibm.com/
>
> v3 -> v4
>
> - Rebased onto the linux-pm/pm branch.
> - Updated commit message and comments based on review feedback.
> - Reordered condition checks as recommended in review.
> - Added Reviewed-by tag from Christian.
>
> ---
>  drivers/cpuidle/governors/menu.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 4d9aa5ce31f0..6a98a724442e 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -320,10 +320,12 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                 }
>
>                 /*
> -                * Use a physical idle state, not busy polling, unless a timer
> -                * is going to trigger soon enough.
> +                * Use a physical idle state instead of busy polling as long as
> +                * its target residency is below the residency threshold and the
> +                * next timer doesn't expire soon.
>                  */
>                 if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> +                   s->target_residency_ns < RESIDENCY_THRESHOLD_NS &&
>                     s->target_residency_ns <= data->next_timer_ns) {
>                         predicted_ns = s->target_residency_ns;
>                         idx = i;
> --
Applied as 6.19 material, thanks!
Powered by blists - more mailing lists
 
