[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0js=Ca4MtcTa2zO2B_fCZNJrOAJzRKAPWSD6fFzn88z_w@mail.gmail.com>
Date: Tue, 1 Apr 2025 18:58:13 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: Aboorva Devarajan <aboorvad@...ux.ibm.com>, "Rafael J . Wysocki" <rafael@...nel.org>,
Daniel Lezcano <daniel.lezcano@...aro.org>, Gautam Menghani <gautam@...ux.ibm.com>,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] cpuidle: menu: Prefer polling state for short idle durations
On Mon, Mar 17, 2025 at 9:35 AM Christian Loehle
<christian.loehle@....com> wrote:
>
> On 3/17/25 06:03, Aboorva Devarajan wrote:
> > Avoid selecting deep idle state when the predicted idle duration is
> > shorter than its target residency, as this leads to unnecessary state
> > transitions without energy savings.
> >
> > On virtualized PowerPC (pseries) systems, where only one polling state
> > (Snooze) and one deep state (CEDE) are available, selecting CEDE when
> > its target residency exceeds the predicted idle duration hurts
> > performance.
> >
> > For example, if the predicted idle duration is 15 us and the first
> > non-polling state has a target residency of 120 us, selecting it
> > would be suboptimal.
> >
> > Remove the condition introduced in commit 69d25870f20c
> > ("cpuidle: fix the menu governor to boost IO performance") that
> > prioritized non-polling states even when their target residency
> > exceeded the predicted idle duration and allow polling states to
> > be selected when appropriate.
> >
> > Performance improvement observed with pgbench on PowerPC (pseries)
> > system:
> > +---------------------------+------------+------------+------------+
> > | Metric | Baseline | Patched | Change (%) |
> > +---------------------------+------------+------------+------------+
> > | Transactions/sec (TPS) | 494,834 | 538,707 | +8.85% |
> > | Avg latency (ms) | 0.162 | 0.149 | -8.02% |
> > +---------------------------+------------+------------+------------+
> >
> > CPUIdle state usage:
> > +--------------+--------------+-------------+
> > | Metric | Baseline | Patched |
> > +--------------+--------------+-------------+
> > | Total usage | 12,703,630 | 13,941,966 |
> > | Above usage | 11,388,990 | 1,620,474 |
> > | Below usage | 19,973 | 684,708 |
> > +--------------+--------------+-------------+
> >
> > Above/Total and Below/Total usage percentages:
> > +------------------------+-----------+---------+
> > | Metric | Baseline | Patched |
> > +------------------------+-----------+---------+
> > | Above % (Above/Total) | 89.67% | 11.63% |
> > | Below % (Below/Total) | 0.16% | 4.91% |
> > | Total cpuidle miss (%) | 89.83% | 16.54% |
> > +------------------------+-----------+---------+
> >
> > Signed-off-by: Aboorva Devarajan <aboorvad@...ux.ibm.com>
> >
> > ---
> >
> > v1: https://lore.kernel.org/all/20240809073120.250974-1-aboorvad@linux.ibm.com/
> >
> > v1 -> v2:
> >
> > - Drop cover letter and improve commit message.
> > ---
> > drivers/cpuidle/governors/menu.c | 11 -----------
> > 1 file changed, 11 deletions(-)
> >
> > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> > index 28363bfa3e4c..4b199377e4be 100644
> > --- a/drivers/cpuidle/governors/menu.c
> > +++ b/drivers/cpuidle/governors/menu.c
> > @@ -296,17 +296,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> > idx = i; /* first enabled state */
> >
> > if (s->target_residency_ns > predicted_ns) {
> > - /*
> > - * Use a physical idle state, not busy polling, unless
> > - * a timer is going to trigger soon enough.
> > - */
> > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> > - s->exit_latency_ns <= latency_req &&
> > - s->target_residency_ns <= data->next_timer_ns) {
> > - predicted_ns = s->target_residency_ns;
> > - idx = i;
> > - break;
> > - }
> > if (predicted_ns < TICK_NSEC)
> > break;
> >
>
> I'm still fine with this and don't see a better way to solve the reported
> issue generally, see the discussion on v1.
> Rafael, do you have any objections?
The behavior change on x86 would be rather unacceptable I'm afraid.
As already stated in a different thread today, on x86 the polling
state turns out to be overly shallow most of the time even before this
patch.
> We could make this conditional on there being a high latency difference between
> polling and non-polling to keep x86 behavior.
If I'm not mistaken, to address the issue at hand, it would be
sufficient to add an "s->target_residency_ns < RESIDENCY_THRESHOLD_NS"
condition to the if () statement removed by this patch, wouldn't it?
Powered by blists - more mailing lists