linux-kernel - Re: [RFC][PATCH 2/3] cpuidle,teo: Improve NOHZ management

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230731120200.GF29590@hirez.programming.kicks-ass.net>
Date:   Mon, 31 Jul 2023 14:02:00 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     anna-maria@...utronix.de, tglx@...utronix.de, frederic@...nel.org,
        gautham.shenoy@....com, linux-kernel@...r.kernel.org,
        daniel.lezcano@...aro.org, linux-pm@...r.kernel.org,
        mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com
Subject: Re: [RFC][PATCH 2/3] cpuidle,teo: Improve NOHZ management

On Mon, Jul 31, 2023 at 12:17:27PM +0200, Rafael J. Wysocki wrote:

> Something really simple like:
> 
> 1. Check sched_cpu_util() (which is done by teo anyway).
> 2. If that is around 90% of the maximum CPU capacity, select the first
> non-polling idle state and be done (don't stop the tick as my other
> replay earlier today).

So I really don't like using cpu_util() here, yes, 90% is a high number,
but it doesn't say *anything* about the idle duration. Remember, this is
a 32ms window, so 90% of that is 28.8ms.

(not entirely accurate, since it's an exponential average, but that
doesn't change the overal argument, only some of the particulars)

That is, 90% util, at best, says there is no idle longer than 3.2 ms.
But that is still vastly longer than pretty much all residencies. Heck,
that is still 3 ticks worth of HZ=1000 ticks. So 90% util should not
preclude disabling the tick (at HZ=1000).

Now, typically this won't be the case, and at 90% you'll have lots of
small idles adding up to 3.2ms total idle. But the point is, you can't
tell the difference. And as such util is a horrible measure to use for
cpuidle.

> > If we track the tick+ bucket -- as
> > we must in order to say anything useful about it, then we can decide the
> > tick state before (as I do here) calling sleep_length().
> >
> > The timer-pull rework from Anna-Maria unfortunately makes the
> > tick_nohz_get_sleep_length() thing excessively expensive and it really
> > doesn't make sense to call it when we retain the tick.
> >
> > It's all a bit of a chicken-egg situation, cpuidle wants to know when
> > the next timer is, but telling when that is, wants to know if the tick
> > stays. We need to break that somehow -- I propose by not calling it when
> > we know we'll keep the tick.
> 
> By selecting a state whose target residency will not be met, we lose
> on both energy and performance, so doing this really should be
> avoided, unless the state is really shallow in which case there may be
> no time for making this consideration.

I'm not sure how that relates to what I propose above. By adding the
tick+ bucket we have more historical information as related to the tick
boundary, how does that make us select states we won't match residency
for?