[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jjs=po8y0MzkUo=mUuqkxq3tg-V8r7-=AUJUu6=9ymMw@mail.gmail.com>
Date: Fri, 14 Feb 2025 22:34:32 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, Linux PM <linux-pm@...r.kernel.org>, dsmythies@...us.net,
LKML <linux-kernel@...r.kernel.org>, Daniel Lezcano <daniel.lezcano@...aro.org>,
Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>,
Aboorva Devarajan <aboorvad@...ux.ibm.com>
Subject: Re: [RFT][PATCH v1] cpuidle: teo: Avoid selecting deepest idle state over-eagerly
On Thu, Feb 13, 2025 at 3:08 PM Christian Loehle
<christian.loehle@....com> wrote:
>
> On 2/4/25 20:58, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > It has been observed that the recent teo governor update which concluded
> > with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
> > for low latency constraints") caused the max-jOPS score of the SPECjbb
> > 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
> > While it may be argued that this is not a significant increase, the
> > previous score can be restored by tweaking the inequality used by teo
> > to decide whether or not to preselect the deepest enabled idle state.
> > That change also causes the critical-jOPS score of SPECjbb to increase
> > by around 2%.
> >
> > Namely, the likelihood of selecting the deepest enabled idle state in
> > teo on the platform in question has increased after commit 13ed5c4a6d9c
> > ("cpuidle: teo: Skip getting the sleep length if wakeups are very
> > frequent") because some timer wakeups were previously counted as non-
> > timer ones and they were effectively added to the left-hand side of the
> > inequality deciding whether or not to preselect the deepest idle state.
> >
> > Many of them are now (accurately) counted as timer wakeups, so the left-
> > hand side of that inequality is now effectively smaller in some cases,
> > especially when timer wakeups often occur in the range below the target
> > residency of the deepest enabled idle state and idle states with target
> > residencies below CPUIDLE_FLAG_POLLING are often selected, but the
> > majority of recent idle intervals are still above that value most of
> > the time. As a result, the deepest enabled idle state may be selected
> > more often than it used to be selected in some cases.
> >
> > To counter that effect, add the sum of the hits metric for all of the
> > idle states below the candidate one (which is the deepest enabled idle
> > state at that point) to the left-hand side of the inequality mentioned
> > above. This will cause it to be more balanced because, in principle,
> > putting both timer and non-timer wakeups on both sides of it is more
> > consistent than only taking into account the timer wakeups in the range
> > above the target residency of the deepest enabled idle state.
> >
> > Link: https://www.spec.org/jbb2015/
> > Tested-by: Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > ---
> > drivers/cpuidle/governors/teo.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -349,13 +349,13 @@
> > }
> >
> > /*
> > - * If the sum of the intercepts metric for all of the idle states
> > - * shallower than the current candidate one (idx) is greater than the
> > + * If the sum of the intercepts and hits metric for all of the idle
> > + * states below the current candidate one (idx) is greater than the
> > * sum of the intercepts and hits metrics for the candidate state and
> > * all of the deeper states, a shallower idle state is likely to be a
> > * better choice.
> > */
> > - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> > + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
> > int first_suitable_idx = idx;
> >
> > /*
> >
> >
> >
>
> I'm curious, are Doug's numbers reproducible?
> Or could you share the idle state usage numbers? Is that explainable?
> Seems like a lot and it does worry me that I can't reproduce anything
> as drastic.
Well, it may not be drastic, but the results below pretty much confirm
that this particular change isn't going in the right direction IMV.
> I did a bit of x86 as well and got for Raptor Lake (I won't post the
> non-x86 numbers now, but teo-tweak performs comparable to teo mainline.)
>
> Idle 5 min:
> device gov iter Joules idles idle_misses idle_miss_ratio belows aboves
> teo 0 170.02 12690 646 0.051 371 275
> teo 1 123.17 8361 517 0.062 281 236
> teo 2 122.76 7741 347 0.045 262 85
> teo 3 118.5 8699 668 0.077 307 361
> teo 4 80.46 8113 443 0.055 264 179
> teo-tweak 0 115.05 10223 803 0.079 323 480
> teo-tweak 1 164.41 8523 631 0.074 263 368
> teo-tweak 2 163.91 8409 711 0.085 256 455
> teo-tweak 3 137.22 8581 721 0.084 261 460
> teo-tweak 4 174.95 8703 675 0.078 261 414
So basically the energy usage goes up, idle misses go up, idle misses
ratio goes up and the "above" misses go way up. Not so good as far as
I'm concerned.
> teo 0 164.34 8443 516 0.061 303 213
> teo 1 167.85 8767 492 0.056 256 236
> teo 2 166.25 7835 406 0.052 263 143
> teo 3 189.77 8865 493 0.056 276 217
> teo 4 136.97 9185 467 0.051 286 181
The above is menu I think?
> At least in the idle case you can see an increase in 'above' idle_misses.
>
> Firefox Youtube 4K video playback 2 min:
> device gov iter Joules idles idle_misses idle_miss_ratio belows aboves
> teo 0 260.09 67404 11189 0.166 1899 9290
> teo 1 273.71 76649 12155 0.159 2233 9922
> teo 2 231.45 59559 10344 0.174 1747 8597
> teo 3 202.61 58223 10641 0.183 1748 8893
> teo 4 217.56 61411 10744 0.175 1809 8935
> teo-tweak 0 227.99 61209 11251 0.184 2110 9141
> teo-tweak 1 222.44 61959 10323 0.167 1474 8849
> teo-tweak 2 218.1 64380 11080 0.172 1845 9235
> teo-tweak 3 207.4 60183 11267 0.187 1929 9338
> teo-tweak 4 217.46 61253 10381 0.169 1620 8761
And it doesn't improve things drastically here, although on average it
does reduce energy usage.
> menu 0 225.72 87871 26032 0.296 25412 620
> menu 1 200.36 86577 24712 0.285 24486 226
> menu 2 214.79 84885 24750 0.292 24556 194
> menu 3 206.07 88007 25938 0.295 25683 255
> menu 4 216.48 88700 26504 0.299 26302 202
>
> (Idle numbers aren't really reflective in energy used -> dominated by
> active power.)
>
Powered by blists - more mailing lists