lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 29 Jul 2021 18:18:10 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Doug Smythies <dsmythies@...us.net>
Cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux PM <linux-pm@...r.kernel.org>
Subject: Re: [PATCH v1 0/5] cpuidle: teo: Rework the idle state selection logic

On Thu, Jul 29, 2021 at 5:24 PM Doug Smythies <dsmythies@...us.net> wrote:
>
> On Wed, Jul 28, 2021 at 11:34 PM Doug Smythies <dsmythies@...us.net> wrote:
> >
> > On Wed, Jul 28, 2021 at 10:47 AM Rafael J. Wysocki <rjw@...ysocki.net> wrote:
> > >
> > > On Wednesday, July 28, 2021 3:52:51 PM CEST Rafael J. Wysocki wrote:
> > > > On Tue, Jul 27, 2021 at 10:06 PM Doug Smythies <dsmythies@...us.net> wrote:
> > > > >
> > > > > Hi Rafael,
> > > > >
> > > > > Further to my reply of 2021.07.04  on this, I have
> > > > > continued to work with and test this patch set.
> > > > >
> > > > > On 2021.06.02 11:14 Rafael J. Wysocki wrote:
> > > > >
> > > > > >This series of patches addresses some theoretical shortcoming in the
> > > > > > TEO (Timer Events Oriented) cpuidle governor by reworking its idle
> > > > > > state selection logic to some extent.
> > > > > >
> > > > > > Patches [1-2/5] are introductory cleanups and the substantial changes are
> > > > > > made in patches [3-4/5] (please refer to the changelogs of these two
> > > > > > patches for details).  The last patch only deals with documentation.
> > > > > >
> > > > > > Even though this work is mostly based on theoretical considerations, it
> > > > > > shows a measurable reduction of the number of cases in which the shallowest
> > > > > > idle state is selected while it would be more beneficial to select a deeper
> > > > > > one or the deepest idle state is selected while it would be more beneficial to
> > > > > > select a shallower one, which should be a noticeable improvement.
> > > > >
> > > > > I am concentrating in the idle state 0 and 1 area.
> > > > > When I disable idle state 0, the expectation is its
> > > > > usage will fall to idle state 1. It doesn't.
> > > > >
> > > > > Conditions:
> > > > > CPU: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> > > > > HWP: disabled
> > > > > CPU frequency scaling driver: intel_pstate, active
> > > > > CPU frequency scaling governor: performance.
> > > > > Idle configuration: As a COMETLAKE processor, with 4 idle states.
> > > > > Sample time for below: 1 minute.
> > > > > Workflow: Cross core named pipe token passing, 12 threads.
> > > > >
> > > > > Kernel 5.14-rc3: idle: teo governor
> > > > >
> > > > > All idle states enabled: PASS
> > > > > Processor: 97 watts
> > > > > Idle state 0 entries: 811151
> > > > > Idle state 1 entries: 140300776
> > > > > Idle state 2 entries: 889
> > > > > Idle state 3 entries: 8
> > > > >
> > > > > Idle state 0 disabled: FAIL <<<<<
> > > > > Processor: 96 watts
> > > > > Idle state 0 entries: 0
> > > > > Idle state 1 entries: 65599283
> > > > > Idle state 2 entries: 364399
> > > > > Idle state 3 entries: 65112651
> > > >
> > > > This looks odd.
> > > >
> > > > Thanks for the report, I'll take a look at this.
> > >
> > > I have found an issue in the code that may be responsible for the
> > > observed behavior and should be addressed by the appended patch (not
> > > tested yet).
> > >
> > > Basically, the "disabled" check in the second loop over states in
> > > teo_select() needs to exclude the first enabled state, because
> > > there are no more states to check after that.
> > >
> > > Plus the time span check needs to be done when the given state
> > > is about to be selected, because otherwise the function may end up
> > > returning a state for which the sums are too low.
> > >
> > > Thanks!
> > >
> > > ---
> > >  drivers/cpuidle/governors/teo.c |   26 ++++++++++++++------------
> > >  1 file changed, 14 insertions(+), 12 deletions(-)
> > >
> > > Index: linux-pm/drivers/cpuidle/governors/teo.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/cpuidle/governors/teo.c
> > > +++ linux-pm/drivers/cpuidle/governors/teo.c
> > > @@ -404,25 +404,27 @@ static int teo_select(struct cpuidle_dri
> > >                         intercept_sum += bin->intercepts;
> > >                         recent_sum += bin->recent;
> > >
> > > -                       if (dev->states_usage[i].disable)
> > > +                       if (dev->states_usage[i].disable && i > idx0)
> > >                                 continue;
> > >
> > >                         span_ns = teo_middle_of_bin(i, drv);
> > > -                       if (!teo_time_ok(span_ns)) {
> > > -                               /*
> > > -                                * The current state is too shallow, so select
> > > -                                * the first enabled deeper state.
> > > -                                */
> > > -                               duration_ns = last_enabled_span_ns;
> > > -                               idx = last_enabled_idx;
> > > -                               break;
> > > -                       }
> > >
> > >                         if ((!alt_recent || 2 * recent_sum > idx_recent_sum) &&
> > >                             (!alt_intercepts ||
> > >                              2 * intercept_sum > idx_intercept_sum)) {
> > > -                               idx = i;
> > > -                               duration_ns = span_ns;
> > > +                               if (!teo_time_ok(span_ns) ||
> > > +                                   dev->states_usage[i].disable) {
> > > +                                       /*
> > > +                                        * The current state is too shallow or
> > > +                                        * disabled, so select the first enabled
> > > +                                        * deeper state.
> > > +                                        */
> > > +                                       duration_ns = last_enabled_span_ns;
> > > +                                       idx = last_enabled_idx;
> > > +                               } else {
> > > +                                       idx = i;
> > > +                                       duration_ns = span_ns;
> > > +                               }
> > >                                 break;
> > >                         }
> >
> > Hi Rafael,
> >
> > I tried the patch and when I disabled idle state 0
> > got, very similar to before:
> >
> > Idle state 0 disabled: FAIL
> > Processor: 95 watts
> > Idle state 0 entries: 0
> > Idle state 1 entries: 65,475,534
> > Idle state 2 entries: 333144
> > Idle state 3 entries: 65,247,048
> >
> > However, I accidently left it for about 30 minutes
> > and noticed:
> >
> > Idle state 0 disabled:
> > Processor: 83 watts
> > Idle state 0 entries: 0
> > Idle state 1 entries: 88,706,831
> > Idle state 2 entries: 100
> > Idle state 3 entries: 662
> >
> > I went back to unmodified kernel 5.13-rc3 and
>
> Sorry, 5.14-rc3.
>
> > let it run longer with idle state 0 disabled, and
> > after 30 minutes it had changed but nowhere
> > near as much:
> >
> > Idle state 0 disabled:
> > Processor: 87 watts
> > Idle state 0 entries: 0
> > Idle state 1 entries: 70,361,020
> > Idle state 2 entries: 71219
> > Idle state 3 entries: 27,249,975
>
> Addendum: So far the workflow used for this
> thread has been event based. If I switch to
> a timer based workflow, everything works as
> expected for both kernels, 5.14-rc3 unmodified
> and modified with the patch from herein.

Yes, the affected case is when the governor selects states that are
shallower than indicated by the time till the next timer.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ