[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0i8q=UoZMmNpe6KLQDk_0Fsmh6pYcxxMUitv68VueA9hA@mail.gmail.com>
Date: Thu, 29 Jan 2026 18:18:15 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Linux PM <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, Doug Smythies <dsmythies@...us.net>
Subject: Re: [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification
of wakeup events
On Thu, Jan 29, 2026 at 10:16 AM Christian Loehle
<christian.loehle@....com> wrote:
>
> On 1/26/26 19:45, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > If differences between target residency values of adjacent idle states
> > of a given CPU are relatively large, the corresponding idle state bins
> > used by the teo governors are large either and the rule by which hits
> > are distinguished from intercepts is inaccurate.
> >
> > Namely, by that rule, a wakeup event is classified as a hit if the
> > sleep length (the time till the closest timer other than the tick)
> > and the measured idle duration, adjusted for the entered idle state
> > exit latency, fall into the same idle state bin. However, if that bin
> > is large enough, the actual difference between the sleep length and
> > the measured idle duration may be significant. It may in fact be
> > significantly greater than the analogous difference for an event where
> > the sleep length and the measured idle duration fall into different
> > bins.
> >
> > For this reason, amend the rule in question with a check that will
> > only allow a wakeup event to be counted as a hit if the difference
> > between the sleep length and the measured idle duration is less than
> > LATENCY_THRESHOLD_NS (which means that the difference between the
> > sleep length and the raw measured idle duration is below the sum of
> > LATENCY_THRESHOLD_NS and 1/2 of the entered idle state exit latency).
> > Otherwise, the event will be counted as an intercept.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > ---
> >
> > v1.1 -> v2: No changes
> >
> > v1 -> v1.1
> > * Drop the change in teo_select() along with the corresponding
> > part of the changelog (after receiving testing feedback from
> > Christian)
> >
> > This is a resend of
> >
> > https://lore.kernel.org/linux-pm/4707705.LvFx2qVVIh@rafael.j.wysocki/
> >
> > It applies on top of the first three patches from
> >
> > https://lore.kernel.org/linux-pm/2257365.irdbgypaU6@rafael.j.wysocki/
> >
> > ---
> > drivers/cpuidle/governors/teo.c | 32 ++++++++++++++++----------------
> > 1 file changed, 16 insertions(+), 16 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -48,13 +48,11 @@
> > * in accordance with what happened last time.
> > *
> > * The "hits" metric reflects the relative frequency of situations in which the
> > - * sleep length and the idle duration measured after CPU wakeup fall into the
> > - * same bin (that is, the CPU appears to wake up "on time" relative to the sleep
> > - * length). In turn, the "intercepts" metric reflects the relative frequency of
> > - * non-timer wakeup events for which the measured idle duration falls into a bin
> > - * that corresponds to an idle state shallower than the one whose bin is fallen
> > - * into by the sleep length (these events are also referred to as "intercepts"
> > - * below).
> > + * sleep length and the idle duration measured after CPU wakeup are close enough
> > + * (that is, the CPU appears to wake up "on time" relative to the sleep length).
> > + * In turn, the "intercepts" metric reflects the relative frequency of non-timer
> > + * wakeup events for which the measured idle duration is measurably less than
> > + * the sleep length (these events are also referred to as "intercepts" below).
> > *
> > * The governor also counts "intercepts" with the measured idle duration below
> > * the tick period length and uses this information when deciding whether or not
> > @@ -253,12 +251,16 @@ static void teo_update(struct cpuidle_dr
> > }
> >
> > /*
> > - * If the measured idle duration falls into the same bin as the sleep
> > - * length, this is a "hit", so update the "hits" metric for that bin.
> > + * If the measured idle duration falls into the same bin as the
> > + * sleep length and the difference between them is less than
> > + * LATENCY_THRESHOLD_NS, this is a "hit", so update the "hits"
> > + * metric for that bin.
> > + *
> > * Otherwise, update the "intercepts" metric for the bin fallen into by
> > * the measured idle duration.
> > */
> > - if (idx_timer == idx_duration) {
> > + if (idx_timer == idx_duration &&
> > + cpu_data->sleep_length_ns - measured_ns < LATENCY_THRESHOLD_NS) {
>
> So it needs to be within 7.5us here.
> Can we always expect that to be true?
It's just a margin.
> Especially since measured_ns does this "infer average from worst-case exit
> latency" handling.
> On deeper states this
> measured_ns -= lat_ns / 2;
> is an order of magnitude higher than our threshold.
True.
> So it should probably be something like
> exit_latency / 2 + LATENCY_THRESHOLD_NS?
> Or just exit_latency and allow the error to both sides?
Well, the exit latency is already there in this inequality because
measured_ns == raw_measured_ns - exit_latency / 2 and I didn't want to
take it into account twice.
And in fact I want sleep_length_ns and measured_us (already adjusted
for the entered state exit latency) to be equal up to a margin and I
just think that the margin can be the same for all of the state bins
because it's basically the granularity of the comparison.
I didn't get it right though and the code should be something like this:
if (idx_timer == idx_duration) {
s64 delta_ns = cpu_data->sleep_length_ns - measured_ns;
if (delta_ns < 0)
delta_ns = -delta_ns;
if (delta_ns < LATENCY_THRESHOLD_NS) {
cpu_data->state_bins[idx_timer].hits += PULSE;
return;
}
}
/*
* Update the "intercepts" metric for the bin fallen into by the
* measured idle duration.
*/
cpu_data->state_bins[idx_duration].intercepts += PULSE;
if (measured_ns <= TICK_NSEC)
cpu_data->tick_intercepts += PULSE;
LATENCY_THRESHOLD_NS is as good as anything else here and for bins
narrower than it (which means C1 and C1e on Intel x86 for instance)
delta_ns will always be less than it, so the behavior there won't
change after the patch.
> > cpu_data->state_bins[idx_timer].hits += PULSE;
> > } else {
> > cpu_data->state_bins[idx_duration].intercepts += PULSE;
> >
> >
Overall, I'll respin the series.
Powered by blists - more mailing lists