linux-kernel - Re: [RFC] cpuidle : Add debugfs support for cpuidle core

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0iMutsf1UK+01dsZa1WDZAVK_fvOZJ-2FgiYXwWDqdnig@mail.gmail.com>
Date:   Thu, 19 Dec 2019 10:04:05 +0100
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     "Oliver O'Halloran" <oohall@...il.com>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Abhishek Goel <huntbag@...ux.vnet.ibm.com>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        "Gautham R. Shenoy" <ego@...ux.vnet.ibm.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Vaidyanathan Srinivasan <svaidy@...ux.ibm.com>
Subject: Re: [RFC] cpuidle : Add debugfs support for cpuidle core

On Wed, Dec 18, 2019 at 3:26 PM Oliver O'Halloran <oohall@...il.com> wrote:
>
> On Wed, Dec 18, 2019 at 3:51 AM Rafael J. Wysocki <rafael@...nel.org> wrote:
> >
> > On Tue, Dec 17, 2019 at 3:42 PM Abhishek Goel
> > <huntbag@...ux.vnet.ibm.com> wrote:
> > >
> > > Up until now, we did not have a way to tune cpuidle attribute like
> > > residency in kernel. This patch adds support for debugfs in cpuidle core.
> > > Thereby providing support for tuning cpuidle attributes like residency in
> > > kernel at runtime.
> >
> > This is not a good idea in my view, for a couple of reasons.
> >
> > First off, if the target residency of an idle state is changed, it
> > effectively becomes a different one and all of the statistics
> > regarding it become outdated at that point.  Synchronizing that would
> > be a pain.
> >
> > Next, governors may get confused if idle state parameters are changed
> > on the fly.  In particular, the statistics collected by the teo
> > governor depend on the target residencies of idle states, so if one of
> > them changes, the governor needs to be reloaded.
> >
> > Next, idle states are expected to be ordered by the target residency
> > (and by the exit latency), so their parameters cannot be allowed to
> > change freely anyway.
> >
> > Finally, the idle state parameters are expected to reflect the
> > properties of the hardware, which wouldn't hold any more if they were
> > allowed to change at any time.
>
> Certainly does sound like a headache.
>
> > > For example: Tuning residency at runtime can be used to quantify governors
> > > decision making as governor uses residency as one of the parameter to
> > > take decision about the state that needs to be entered while idling.
> >
> > IMO it would be better to introduce a testing cpuidle driver with an
> > artificial set of idle states (or even such that the set of idle
> > states to be used by it can be defined by the user e.g. via module
> > parameters) for this purpose.
>
> The motivation for this patch isn't really a desire to test / tune the
> governor. It's intended to allow working around a performance problem
> caused by using high-latency idle states on some interrupt heavy GPU
> workload. The interrupts occur around ~30ms apart which is long enough
> for the governor to put the CPU into the deeper states and over the
> course of long job the additional wakeup latency adds up. The initial
> fix someone came up with was cooking the residency values so the
> high-latency states had a residency of +50ms to prevent the govenor
> from using them. However, that fix is supposed to go into a bit of
> firmware I maintain and I'm not terribly happy with the idea. I'm
> fairly sure that ~30ms value is workload dependent and personally I
> don't think firmware should be making up numbers to trick specific
> kernel versions into doing specific things.
>
> My impression is the right solution is to have the GPU driver set a PM
> QoS constraint on the CPUs receiving interrupts while a job is
> on-going.

Yes, that would address the GPU problem.

> However, interrupt latency sensitivity isn't something
> that's unique to GPUs so I'm wondering it it makes sense to have the
> governor factor in interrupt traffic when deciding what state to use.
> Is that something that's been tried before?

Yes, that is in the works.

The existing governors should take interrupts into account too in the
form of the expected idle duration corrections, but that may not be
particularly precise.  If the governor currently in use (I guess menu)
doesn't to that, you may try an alternative one (e.g. teo).

That said, work is in progress on taking the actual interrupt
frequency into account in idle duration prediction.

Thanks!