linux-kernel - Re: [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification of wakeup events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0i8q=UoZMmNpe6KLQDk_0Fsmh6pYcxxMUitv68VueA9hA@mail.gmail.com>
Date: Thu, 29 Jan 2026 18:18:15 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Linux PM <linux-pm@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, Doug Smythies <dsmythies@...us.net>
Subject: Re: [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification
 of wakeup events

On Thu, Jan 29, 2026 at 10:16 AM Christian Loehle
<christian.loehle@....com> wrote:
>
> On 1/26/26 19:45, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > If differences between target residency values of adjacent idle states
> > of a given CPU are relatively large, the corresponding idle state bins
> > used by the teo governors are large either and the rule by which hits
> > are distinguished from intercepts is inaccurate.
> >
> > Namely, by that rule, a wakeup event is classified as a hit if the
> > sleep length (the time till the closest timer other than the tick)
> > and the measured idle duration, adjusted for the entered idle state
> > exit latency, fall into the same idle state bin.  However, if that bin
> > is large enough, the actual difference between the sleep length and
> > the measured idle duration may be significant.  It may in fact be
> > significantly greater than the analogous difference for an event where
> > the sleep length and the measured idle duration fall into different
> > bins.
> >
> > For this reason, amend the rule in question with a check that will
> > only allow a wakeup event to be counted as a hit if the difference
> > between the sleep length and the measured idle duration is less than
> > LATENCY_THRESHOLD_NS (which means that the difference between the
> > sleep length and the raw measured idle duration is below the sum of
> > LATENCY_THRESHOLD_NS and 1/2 of the entered idle state exit latency).
> > Otherwise, the event will be counted as an intercept.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > ---
> >
> > v1.1 -> v2: No changes
> >
> > v1 -> v1.1
> >    * Drop the change in teo_select() along with the corresponding
> >      part of the changelog (after receiving testing feedback from
> >      Christian)
> >
> > This is a resend of
> >
> > https://lore.kernel.org/linux-pm/4707705.LvFx2qVVIh@rafael.j.wysocki/
> >
> > It applies on top of the first three patches from
> >
> > https://lore.kernel.org/linux-pm/2257365.irdbgypaU6@rafael.j.wysocki/
> >
> > ---
> >  drivers/cpuidle/governors/teo.c |   32 ++++++++++++++++----------------
> >  1 file changed, 16 insertions(+), 16 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -48,13 +48,11 @@
> >   * in accordance with what happened last time.
> >   *
> >   * The "hits" metric reflects the relative frequency of situations in which the
> > - * sleep length and the idle duration measured after CPU wakeup fall into the
> > - * same bin (that is, the CPU appears to wake up "on time" relative to the sleep
> > - * length).  In turn, the "intercepts" metric reflects the relative frequency of
> > - * non-timer wakeup events for which the measured idle duration falls into a bin
> > - * that corresponds to an idle state shallower than the one whose bin is fallen
> > - * into by the sleep length (these events are also referred to as "intercepts"
> > - * below).
> > + * sleep length and the idle duration measured after CPU wakeup are close enough
> > + * (that is, the CPU appears to wake up "on time" relative to the sleep length).
> > + * In turn, the "intercepts" metric reflects the relative frequency of non-timer
> > + * wakeup events for which the measured idle duration is measurably less than
> > + * the sleep length (these events are also referred to as "intercepts" below).
> >   *
> >   * The governor also counts "intercepts" with the measured idle duration below
> >   * the tick period length and uses this information when deciding whether or not
> > @@ -253,12 +251,16 @@ static void teo_update(struct cpuidle_dr
> >       }
> >
> >       /*
> > -      * If the measured idle duration falls into the same bin as the sleep
> > -      * length, this is a "hit", so update the "hits" metric for that bin.
> > +      * If the measured idle duration falls into the same bin as the
> > +      * sleep length and the difference between them is less than
> > +      * LATENCY_THRESHOLD_NS, this is a "hit", so update the "hits"
> > +      * metric for that bin.
> > +      *
> >        * Otherwise, update the "intercepts" metric for the bin fallen into by
> >        * the measured idle duration.
> >        */
> > -     if (idx_timer == idx_duration) {
> > +     if (idx_timer == idx_duration &&
> > +         cpu_data->sleep_length_ns - measured_ns < LATENCY_THRESHOLD_NS) {
>
> So it needs to be within 7.5us here.
> Can we always expect that to be true?

It's just a margin.

> Especially since measured_ns does this "infer average from worst-case exit
> latency" handling.
> On deeper states this
> measured_ns -= lat_ns / 2;
> is an order of magnitude higher than our threshold.

True.

> So it should probably be something like
> exit_latency / 2 + LATENCY_THRESHOLD_NS?
> Or just exit_latency and allow the error to both sides?

Well, the exit latency is already there in this inequality because
measured_ns == raw_measured_ns - exit_latency / 2 and I didn't want to
take it into account twice.

And in fact I want sleep_length_ns and measured_us (already adjusted
for the entered state exit latency) to be equal up to a margin and I
just think that the margin can be the same for all of the state bins
because it's basically the granularity of the comparison.

I didn't get it right though and the code should be something like this:

    if (idx_timer == idx_duration) {
        s64 delta_ns = cpu_data->sleep_length_ns - measured_ns;

        if (delta_ns < 0)
            delta_ns = -delta_ns;

        if (delta_ns < LATENCY_THRESHOLD_NS) {
            cpu_data->state_bins[idx_timer].hits += PULSE;
            return;
        }
    }
    /*
     * Update the "intercepts" metric for the bin fallen into by the
     * measured idle duration.
     */
    cpu_data->state_bins[idx_duration].intercepts += PULSE;
    if (measured_ns <= TICK_NSEC)
        cpu_data->tick_intercepts += PULSE;

LATENCY_THRESHOLD_NS is as good as anything else here and for bins
narrower than it (which means C1 and C1e on Intel x86 for instance)
delta_ns will always be less than it, so the behavior there won't
change after the patch.

> >               cpu_data->state_bins[idx_timer].hits += PULSE;
> >       } else {
> >               cpu_data->state_bins[idx_duration].intercepts += PULSE;
> >
> >

Overall, I'll respin the series.