[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jELEw3yAoRFLPgEcfBRoUyd6tKSNHO2Q1O6_CoeR1Bng@mail.gmail.com>
Date: Sat, 8 Feb 2025 12:24:45 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Doug Smythies <dsmythies@...us.net>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, LKML <linux-kernel@...r.kernel.org>,
Daniel Lezcano <daniel.lezcano@...aro.org>, Christian Loehle <christian.loehle@....com>,
Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>,
Aboorva Devarajan <aboorvad@...ux.ibm.com>, Linux PM <linux-pm@...r.kernel.org>
Subject: Re: [RFT][PATCH v1] cpuidle: teo: Avoid selecting deepest idle state over-eagerly
Hi Doug,
On Sat, Feb 8, 2025 at 12:40 AM Doug Smythies <dsmythies@...us.net> wrote:
>
> Hi Rafael,
>
> On 2025.02.04 12:58 Rafael J. Wysocki wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > It has been observed that the recent teo governor update which concluded
> > with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
> > for low latency constraints") caused the max-jOPS score of the SPECjbb
> > 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
> > While it may be argued that this is not a significant increase, the
> > previous score can be restored by tweaking the inequality used by teo
> > to decide whether or not to preselect the deepest enabled idle state.
> > That change also causes the critical-jOPS score of SPECjbb to increase
> > by around 2%.
> >
> > Namely, the likelihood of selecting the deepest enabled idle state in
> > teo on the platform in question has increased after commit 13ed5c4a6d9c
> > ("cpuidle: teo: Skip getting the sleep length if wakeups are very
> > frequent") because some timer wakeups were previously counted as non-
> > timer ones and they were effectively added to the left-hand side of the
> > inequality deciding whether or not to preselect the deepest idle state.
> >
> > Many of them are now (accurately) counted as timer wakeups, so the left-
> > hand side of that inequality is now effectively smaller in some cases,
> > especially when timer wakeups often occur in the range below the target
> > residency of the deepest enabled idle state and idle states with target
> > residencies below CPUIDLE_FLAG_POLLING are often selected, but the
> > majority of recent idle intervals are still above that value most of
> > the time. As a result, the deepest enabled idle state may be selected
> > more often than it used to be selected in some cases.
> >
> > To counter that effect, add the sum of the hits metric for all of the
> > idle states below the candidate one (which is the deepest enabled idle
> > state at that point) to the left-hand side of the inequality mentioned
> > above. This will cause it to be more balanced because, in principle,
> > putting both timer and non-timer wakeups on both sides of it is more
> > consistent than only taking into account the timer wakeups in the range
> > above the target residency of the deepest enabled idle state.
> >
> > Link: https://www.spec.org/jbb2015/
> > Tested-by: Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > ---
> > drivers/cpuidle/governors/teo.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -349,13 +349,13 @@
> > }
> >
> > /*
> > - * If the sum of the intercepts metric for all of the idle states
> > - * shallower than the current candidate one (idx) is greater than the
> > + * If the sum of the intercepts and hits metric for all of the idle
> > + * states below the current candidate one (idx) is greater than the
> > * sum of the intercepts and hits metrics for the candidate state and
> > * all of the deeper states, a shallower idle state is likely to be a
> > * better choice.
> > */
> > - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> > + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
> > int first_suitable_idx = idx;
> >
> > /*
>
> I have only just started testing the recent idle governor changes,
> and have not gotten very far yet.
>
> There is a significant increase in processor package power during idle
> with this patch, about 5 times increase (400%).
Thanks for this data point!
The restoration of the 1.4% benchmark score is not worth this cost IMV.
> My processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> Distro: Ubuntu 24.04.1, server, no desktop GUI.
> CPU frequency scaling driver: intel_pstate
> HWP: disabled.
> CPU frequency scaling governor: performance
>
> Idle states:
> $ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
> /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
> /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI
> /sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2_ACPI
> /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3_ACPI
>
> Test durations were >= 45 minutes each.
>
> Kernel 6.14-rc1: Includes cpuidle: teo: Cleanups and very frequent wakeups handling update
> Average Idle Power: teo governor: 2.199 watts (+25.51%)
> Average Idle power: menu governor: 1.873 watts (+6.91%)
menu hasn't changed in 6.14-rc1, so this would be variation between
runs I suppose.
> Kernel 6.14-rc1-p: Added this patch for teo and "cpuidle: menu: Avoid discarding useful information when processing recent idle intervals" for menu
> Average Idle Power: teo governor: 9.401 watts (+436.6%)
> Only 69.61% idle is in the deepest idle state. More typically it would be 98% to 99%.
Ah, not good.
OK, this clearly doesn't go in the right direction.
> 28.6531% idle time is in state 1. More typically it would be 0.03%
> Average Idle Power: menu governor: 1.820 watts (+3.9%)
>
> Kernel 6.13: before "cpuidle: teo: Cleanups and very frequent wakeups handling update"
> Average Idle Power: teo governor: 1.752 watts (reference: 0.0%)
> Average Idle power: menu governor: 1.909 watts (+9.0%)
Thanks, I'm just going to drop this patch then.
If you don't mind, I'll have a couple more teo updates for testing.
Powered by blists - more mailing lists