lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jo=gbCZoPPHxUTyCV-h81G1amTonc5CgA3HJW_aeUqoA@mail.gmail.com>
Date: Thu, 29 Jan 2026 21:08:55 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>
Cc: Linux PM <linux-pm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
	Doug Smythies <dsmythies@...us.net>
Subject: Re: [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification
 of wakeup events

On Thu, Jan 29, 2026 at 6:18 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Thu, Jan 29, 2026 at 10:16 AM Christian Loehle
> <christian.loehle@....com> wrote:
> >
> > On 1/26/26 19:45, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > >
> > > If differences between target residency values of adjacent idle states
> > > of a given CPU are relatively large, the corresponding idle state bins
> > > used by the teo governors are large either and the rule by which hits
> > > are distinguished from intercepts is inaccurate.
> > >
> > > Namely, by that rule, a wakeup event is classified as a hit if the
> > > sleep length (the time till the closest timer other than the tick)
> > > and the measured idle duration, adjusted for the entered idle state
> > > exit latency, fall into the same idle state bin.  However, if that bin
> > > is large enough, the actual difference between the sleep length and
> > > the measured idle duration may be significant.  It may in fact be
> > > significantly greater than the analogous difference for an event where
> > > the sleep length and the measured idle duration fall into different
> > > bins.
> > >
> > > For this reason, amend the rule in question with a check that will
> > > only allow a wakeup event to be counted as a hit if the difference
> > > between the sleep length and the measured idle duration is less than
> > > LATENCY_THRESHOLD_NS (which means that the difference between the
> > > sleep length and the raw measured idle duration is below the sum of
> > > LATENCY_THRESHOLD_NS and 1/2 of the entered idle state exit latency).
> > > Otherwise, the event will be counted as an intercept.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > > ---
> > >
> > > v1.1 -> v2: No changes
> > >
> > > v1 -> v1.1
> > >    * Drop the change in teo_select() along with the corresponding
> > >      part of the changelog (after receiving testing feedback from
> > >      Christian)
> > >
> > > This is a resend of
> > >
> > > https://lore.kernel.org/linux-pm/4707705.LvFx2qVVIh@rafael.j.wysocki/
> > >
> > > It applies on top of the first three patches from
> > >
> > > https://lore.kernel.org/linux-pm/2257365.irdbgypaU6@rafael.j.wysocki/
> > >
> > > ---
> > >  drivers/cpuidle/governors/teo.c |   32 ++++++++++++++++----------------
> > >  1 file changed, 16 insertions(+), 16 deletions(-)
> > >
> > > --- a/drivers/cpuidle/governors/teo.c
> > > +++ b/drivers/cpuidle/governors/teo.c
> > > @@ -48,13 +48,11 @@
> > >   * in accordance with what happened last time.
> > >   *
> > >   * The "hits" metric reflects the relative frequency of situations in which the
> > > - * sleep length and the idle duration measured after CPU wakeup fall into the
> > > - * same bin (that is, the CPU appears to wake up "on time" relative to the sleep
> > > - * length).  In turn, the "intercepts" metric reflects the relative frequency of
> > > - * non-timer wakeup events for which the measured idle duration falls into a bin
> > > - * that corresponds to an idle state shallower than the one whose bin is fallen
> > > - * into by the sleep length (these events are also referred to as "intercepts"
> > > - * below).
> > > + * sleep length and the idle duration measured after CPU wakeup are close enough
> > > + * (that is, the CPU appears to wake up "on time" relative to the sleep length).
> > > + * In turn, the "intercepts" metric reflects the relative frequency of non-timer
> > > + * wakeup events for which the measured idle duration is measurably less than
> > > + * the sleep length (these events are also referred to as "intercepts" below).
> > >   *
> > >   * The governor also counts "intercepts" with the measured idle duration below
> > >   * the tick period length and uses this information when deciding whether or not
> > > @@ -253,12 +251,16 @@ static void teo_update(struct cpuidle_dr
> > >       }
> > >
> > >       /*
> > > -      * If the measured idle duration falls into the same bin as the sleep
> > > -      * length, this is a "hit", so update the "hits" metric for that bin.
> > > +      * If the measured idle duration falls into the same bin as the
> > > +      * sleep length and the difference between them is less than
> > > +      * LATENCY_THRESHOLD_NS, this is a "hit", so update the "hits"
> > > +      * metric for that bin.
> > > +      *
> > >        * Otherwise, update the "intercepts" metric for the bin fallen into by
> > >        * the measured idle duration.
> > >        */
> > > -     if (idx_timer == idx_duration) {
> > > +     if (idx_timer == idx_duration &&
> > > +         cpu_data->sleep_length_ns - measured_ns < LATENCY_THRESHOLD_NS) {
> >
> > So it needs to be within 7.5us here.
> > Can we always expect that to be true?
>
> It's just a margin.
>
> > Especially since measured_ns does this "infer average from worst-case exit
> > latency" handling.
> > On deeper states this
> > measured_ns -= lat_ns / 2;
> > is an order of magnitude higher than our threshold.
>
> True.
>
> > So it should probably be something like
> > exit_latency / 2 + LATENCY_THRESHOLD_NS?
> > Or just exit_latency and allow the error to both sides?
>
> Well, the exit latency is already there in this inequality because
> measured_ns == raw_measured_ns - exit_latency / 2 and I didn't want to
> take it into account twice.
>
> And in fact I want sleep_length_ns and measured_us (already adjusted
> for the entered state exit latency) to be equal up to a margin and I
> just think that the margin can be the same for all of the state bins
> because it's basically the granularity of the comparison.

Well, scratch the above paragraph.

The point is that cpu_data->sleep_length_ns should be less than
measured_ns (which means that the wakeup appears to have occurred
after the anticipated timer event) or at least not much greater than
it (the actual wakeup latency might be shorter than 1/2 of the
declared one due to a prewake or similar).  How much sleep_length_ns
can be greater than measured_ns for the wakeup to still count as a
"hit" is, of course, a matter of choice and I thought that it would be
reasonable to use a constant limit.

However, the limit may as well be chosen to depend on the exit latency
of the entered state and it can be as large as 1/2 of that number (I
don't think that using a larger number would make a lot of sense).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ