lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jjs=po8y0MzkUo=mUuqkxq3tg-V8r7-=AUJUu6=9ymMw@mail.gmail.com>
Date: Fri, 14 Feb 2025 22:34:32 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, Linux PM <linux-pm@...r.kernel.org>, dsmythies@...us.net, 
	LKML <linux-kernel@...r.kernel.org>, Daniel Lezcano <daniel.lezcano@...aro.org>, 
	Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>, 
	Aboorva Devarajan <aboorvad@...ux.ibm.com>
Subject: Re: [RFT][PATCH v1] cpuidle: teo: Avoid selecting deepest idle state over-eagerly

On Thu, Feb 13, 2025 at 3:08 PM Christian Loehle
<christian.loehle@....com> wrote:
>
> On 2/4/25 20:58, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > It has been observed that the recent teo governor update which concluded
> > with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
> > for low latency constraints") caused the max-jOPS score of the SPECjbb
> > 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
> > While it may be argued that this is not a significant increase, the
> > previous score can be restored by tweaking the inequality used by teo
> > to decide whether or not to preselect the deepest enabled idle state.
> > That change also causes the critical-jOPS score of SPECjbb to increase
> > by around 2%.
> >
> > Namely, the likelihood of selecting the deepest enabled idle state in
> > teo on the platform in question has increased after commit 13ed5c4a6d9c
> > ("cpuidle: teo: Skip getting the sleep length if wakeups are very
> > frequent") because some timer wakeups were previously counted as non-
> > timer ones and they were effectively added to the left-hand side of the
> > inequality deciding whether or not to preselect the deepest idle state.
> >
> > Many of them are now (accurately) counted as timer wakeups, so the left-
> > hand side of that inequality is now effectively smaller in some cases,
> > especially when timer wakeups often occur in the range below the target
> > residency of the deepest enabled idle state and idle states with target
> > residencies below CPUIDLE_FLAG_POLLING are often selected, but the
> > majority of recent idle intervals are still above that value most of
> > the time.  As a result, the deepest enabled idle state may be selected
> > more often than it used to be selected in some cases.
> >
> > To counter that effect, add the sum of the hits metric for all of the
> > idle states below the candidate one (which is the deepest enabled idle
> > state at that point) to the left-hand side of the inequality mentioned
> > above.  This will cause it to be more balanced because, in principle,
> > putting both timer and non-timer wakeups on both sides of it is more
> > consistent than only taking into account the timer wakeups in the range
> > above the target residency of the deepest enabled idle state.
> >
> > Link: https://www.spec.org/jbb2015/
> > Tested-by: Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > ---
> >  drivers/cpuidle/governors/teo.c |    6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -349,13 +349,13 @@
> >       }
> >
> >       /*
> > -      * If the sum of the intercepts metric for all of the idle states
> > -      * shallower than the current candidate one (idx) is greater than the
> > +      * If the sum of the intercepts and hits metric for all of the idle
> > +      * states below the current candidate one (idx) is greater than the
> >        * sum of the intercepts and hits metrics for the candidate state and
> >        * all of the deeper states, a shallower idle state is likely to be a
> >        * better choice.
> >        */
> > -     if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> > +     if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
> >               int first_suitable_idx = idx;
> >
> >               /*
> >
> >
> >
>
> I'm curious, are Doug's numbers reproducible?
> Or could you share the idle state usage numbers? Is that explainable?
> Seems like a lot and it does worry me that I can't reproduce anything
> as drastic.

Well, it may not be drastic, but the results below pretty much confirm
that this particular change isn't going in the right direction IMV.

> I did a bit of x86 as well and got for Raptor Lake (I won't post the
> non-x86 numbers now, but teo-tweak performs comparable to teo mainline.)
>
> Idle 5 min:
> device   gov     iter    Joules  idles   idle_misses     idle_miss_ratio         belows  aboves
> teo     0       170.02  12690   646     0.051   371     275
> teo     1       123.17  8361    517     0.062   281     236
> teo     2       122.76  7741    347     0.045   262     85
> teo     3       118.5   8699    668     0.077   307     361
> teo     4       80.46   8113    443     0.055   264     179
> teo-tweak       0       115.05  10223   803     0.079   323     480
> teo-tweak       1       164.41  8523    631     0.074   263     368
> teo-tweak       2       163.91  8409    711     0.085   256     455
> teo-tweak       3       137.22  8581    721     0.084   261     460
> teo-tweak       4       174.95  8703    675     0.078   261     414

So basically the energy usage goes up, idle misses go up, idle misses
ratio goes up and the "above" misses go way up.  Not so good as far as
I'm concerned.

> teo     0       164.34  8443    516     0.061   303     213
> teo     1       167.85  8767    492     0.056   256     236
> teo     2       166.25  7835    406     0.052   263     143
> teo     3       189.77  8865    493     0.056   276     217
> teo     4       136.97  9185    467     0.051   286     181

The above is menu I think?

> At least in the idle case you can see an increase in 'above' idle_misses.
>
> Firefox Youtube 4K video playback 2 min:
> device   gov     iter    Joules  idles   idle_misses     idle_miss_ratio         belows  aboves
> teo     0       260.09  67404   11189   0.166   1899    9290
> teo     1       273.71  76649   12155   0.159   2233    9922
> teo     2       231.45  59559   10344   0.174   1747    8597
> teo     3       202.61  58223   10641   0.183   1748    8893
> teo     4       217.56  61411   10744   0.175   1809    8935
> teo-tweak       0       227.99  61209   11251   0.184   2110    9141
> teo-tweak       1       222.44  61959   10323   0.167   1474    8849
> teo-tweak       2       218.1   64380   11080   0.172   1845    9235
> teo-tweak       3       207.4   60183   11267   0.187   1929    9338
> teo-tweak       4       217.46  61253   10381   0.169   1620    8761

And it doesn't improve things drastically here, although on average it
does reduce energy usage.

> menu    0       225.72  87871   26032   0.296   25412   620
> menu    1       200.36  86577   24712   0.285   24486   226
> menu    2       214.79  84885   24750   0.292   24556   194
> menu    3       206.07  88007   25938   0.295   25683   255
> menu    4       216.48  88700   26504   0.299   26302   202
>
> (Idle numbers aren't really reflective in energy used -> dominated by
> active power.)
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ