lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gWwqtqezkBapqK4RbefOT2q7R7pWiTb8E4AbptFu7tAg@mail.gmail.com>
Date:   Fri, 25 Nov 2022 14:36:53 +0100
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Zhang Rui <rui.zhang@...el.com>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>, rjw@...ysocki.net,
        daniel.lezcano@...aro.org, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold

On Fri, Nov 25, 2022 at 7:39 AM Zhang Rui <rui.zhang@...el.com> wrote:
>
> Hi, Rafael,
>
> thanks for reviewing the patch series.
>
> On Wed, 2022-11-23 at 18:50 +0100, Rafael J. Wysocki wrote:
> > On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@...el.com> wrote:
> > > After fixing the bogus comparison between u64 and s64, the ladder
> > > governor stops making promotion decisions errornously.
> > >
> > > However, after this, it is found that the ladder governor demotes
> > > much
> > > easier than promotes.
> >
> > "After fixing an error related to using signed and unsigned integers
> > in the ladder governor in a previous patch, that governor turns out
> > to
> > demote much easier than promote"
> >
> > > Below is captured using turbostat after a 30 seconds runtime idle,
> > >
> > > Without previous patch,
> > > Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8
> > >      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> > > 0.30    2373    0       0       0       4       9       25      122
> > >      326     2857    0.36    0.04    0.57    98.73   1.48
> >
> > Why is the above relevant?
>
> Just for comparison purpose.
> For a pure idle scenario (Busy% < 0.5), with ladder governor, we used
> to have 99% CPU%c7, but now, with patch 1/3,
>
> CPU%c1  CPU%c3  CPU%c6  CPU%c7
> 34.18   16.21   17.69   31.51
> This does not look like the correct behavior for any cpuidle governor.

It all depends on what the design goal was and I don't really know
that in this particular case.

It looks like the plan was to make it promote less often than demote
or the counts would have been chosen differently.

> >
> > > With previous patch,
> > > Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8
> > >      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> > > 0.42    3071    0       771     838     447     327     336     382
> > >      299     344     34.18   16.21   17.69   31.51   2.00
> > >
> > > And this is caused by the imbalanced
> > > PROMOTION_COUNT/DEMOTION_COUNT.
> >
> > I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT
> > imbalance causes this.
>
> sure, how about something below.
>
> The PROMOTION_COUNT/DEMOTION_COUNT are used as the threshold between
> the ladder governor detects it "should promote/demote", and the ladder
> governor does a real promotion/demotion.
>
> Currently, PROMOTION_COUNT is set to 4 and DEMOTION_COUNT is set to 1.
> This means that the ladder governor does real demotion immediately when
> it "should demote", but it does real promotion only if it "should
> promote" 4 times in a row, without a single "should demote" occur in
> between.
>
> As a result, this lower the chance to do real promotion and the ladder
> governor is more likely to choose a shallower state.

Sounds good and now the question is what's the behavior expected by
users.  Do we have any data?

> >
> > I guess more residency in the deeper idle state is expected?  Or
> > desired??
> >
> > > With this patch,
> > > Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8
> > >      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> > > 0.39    2436    0       1       72      177     51      194     243
> > >      799     1883    0.50    0.32    0.35    98.45   1.53
> > >
> > > Note that this is an experimental patch to illustrate the problem,
> > > and it is checked with idle scenario only for now.
> > > I will try to evaluate with more scenarios, and if someone can help
> > > evaluate with more scenarios at the same time and provide data for
> > > the
> > > benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that
> > > would be great.
> >
> > So yes, this requires more work.
> >
> > Overall, I think that you are concerned that the previous change
> > might
> > be regarded as a regression and are trying to compensate for it with
> > a
> > PROMOTION_COUNT/DEMOTION_COUNT change.
>
> Exactly.
>
> > I'm not sure I can agree with that approach, because the shallower
> > idle states might be preferred by the original ladder design
> > intentionally, for performance reasons.
> >
> hmmm, even if there is only 30% c7/c8/c9/c10 residency in a pure idle
> scenario?

Yes, even in that case.  All boils down to the question regarding user
expectations.

> And further more, since the imbalanced PROMOTION_COUNT/DEMOTION_COUNT
> and the unsigned/signed integers problem are both there since the first
> day the ladder governor was introduced, commit 4f86d3a8e297 ("cpuidle:
> consolidate 2.6.22 cpuidle branch into one patch"),
>
> my guess is that
>
> the unsigned/signed integers problem introduces a lot of pseudo
> promotions, and the PROMOTION_COUNT/DEMOTION_COUNT is introduced to
> workaround this so that the ladder governor doesn't get stuck at deep
> idle state.

That may be a good guess, so I would add it to the changelog of the patch.

> I don't have a solid proof for this. But at least for the pure idle
> scenario, I don't think 30% deep idle residency is the right behavior,
> and it needs to be tuned anyway.

Well, have you checked what happens if the counts are set to the same
value, e.g. 2?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ