lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0js=Ca4MtcTa2zO2B_fCZNJrOAJzRKAPWSD6fFzn88z_w@mail.gmail.com>
Date: Tue, 1 Apr 2025 18:58:13 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: Aboorva Devarajan <aboorvad@...ux.ibm.com>, "Rafael J . Wysocki" <rafael@...nel.org>, 
	Daniel Lezcano <daniel.lezcano@...aro.org>, Gautam Menghani <gautam@...ux.ibm.com>, 
	linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] cpuidle: menu: Prefer polling state for short idle durations

On Mon, Mar 17, 2025 at 9:35 AM Christian Loehle
<christian.loehle@....com> wrote:
>
> On 3/17/25 06:03, Aboorva Devarajan wrote:
> > Avoid selecting deep idle state when the predicted idle duration is
> > shorter than its target residency, as this leads to unnecessary state
> > transitions without energy savings.
> >
> > On virtualized PowerPC (pseries) systems, where only one polling state
> > (Snooze) and one deep state (CEDE) are available, selecting CEDE when
> > its target residency exceeds the predicted idle duration hurts
> > performance.
> >
> > For example, if the predicted idle duration is 15 us and the first
> > non-polling state has a target residency of 120 us, selecting it
> > would be suboptimal.
> >
> > Remove the condition introduced in commit 69d25870f20c
> > ("cpuidle: fix the menu governor to boost IO performance") that
> > prioritized non-polling states even when their target residency
> > exceeded the predicted idle duration and allow polling states to
> > be selected when appropriate.
> >
> > Performance improvement observed with pgbench on PowerPC (pseries)
> > system:
> > +---------------------------+------------+------------+------------+
> > | Metric                    | Baseline   | Patched    | Change (%) |
> > +---------------------------+------------+------------+------------+
> > | Transactions/sec (TPS)    | 494,834    | 538,707    | +8.85%     |
> > | Avg latency (ms)          | 0.162      | 0.149      | -8.02%     |
> > +---------------------------+------------+------------+------------+
> >
> > CPUIdle state usage:
> > +--------------+--------------+-------------+
> > | Metric       | Baseline     | Patched     |
> > +--------------+--------------+-------------+
> > | Total usage  | 12,703,630   | 13,941,966  |
> > | Above usage  | 11,388,990   | 1,620,474   |
> > | Below usage  | 19,973       | 684,708     |
> > +--------------+--------------+-------------+
> >
> > Above/Total and Below/Total usage percentages:
> > +------------------------+-----------+---------+
> > | Metric                 | Baseline  | Patched |
> > +------------------------+-----------+---------+
> > | Above % (Above/Total)  | 89.67%    | 11.63%  |
> > | Below % (Below/Total)  | 0.16%     | 4.91%   |
> > | Total cpuidle miss (%) | 89.83%    | 16.54%  |
> > +------------------------+-----------+---------+
> >
> > Signed-off-by: Aboorva Devarajan <aboorvad@...ux.ibm.com>
> >
> > ---
> >
> > v1: https://lore.kernel.org/all/20240809073120.250974-1-aboorvad@linux.ibm.com/
> >
> > v1 -> v2:
> >
> > - Drop cover letter and improve commit message.
> > ---
> >  drivers/cpuidle/governors/menu.c | 11 -----------
> >  1 file changed, 11 deletions(-)
> >
> > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> > index 28363bfa3e4c..4b199377e4be 100644
> > --- a/drivers/cpuidle/governors/menu.c
> > +++ b/drivers/cpuidle/governors/menu.c
> > @@ -296,17 +296,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> >                       idx = i; /* first enabled state */
> >
> >               if (s->target_residency_ns > predicted_ns) {
> > -                     /*
> > -                      * Use a physical idle state, not busy polling, unless
> > -                      * a timer is going to trigger soon enough.
> > -                      */
> > -                     if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> > -                         s->exit_latency_ns <= latency_req &&
> > -                         s->target_residency_ns <= data->next_timer_ns) {
> > -                             predicted_ns = s->target_residency_ns;
> > -                             idx = i;
> > -                             break;
> > -                     }
> >                       if (predicted_ns < TICK_NSEC)
> >                               break;
> >
>
> I'm still fine with this and don't see a better way to solve the reported
> issue generally, see the discussion on v1.
> Rafael, do you have any objections?

The behavior change on x86 would be rather unacceptable I'm afraid.

As already stated in a different thread today, on x86 the polling
state turns out to be overly shallow most of the time even before this
patch.

> We could make this conditional on there being a high latency difference between
> polling and non-polling to keep x86 behavior.

If I'm not mistaken, to address the issue at hand, it would be
sufficient to add an "s->target_residency_ns < RESIDENCY_THRESHOLD_NS"
condition to the if () statement removed by this patch, wouldn't it?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ