linux-kernel - Re: [RFC] cpuidle : Add debugfs support for cpuidle core

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <33e38a86-a022-6c15-7c78-77903ec5abc5@linux.vnet.ibm.com>
Date:   Fri, 20 Dec 2019 18:20:46 +0530
From:   Abhishek <huntbag@...ux.vnet.ibm.com>
To:     "Rafael J. Wysocki" <rafael@...nel.org>,
        "Oliver O'Halloran" <oohall@...il.com>
Cc:     Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        "Gautham R. Shenoy" <ego@...ux.vnet.ibm.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Vaidyanathan Srinivasan <svaidy@...ux.ibm.com>
Subject: Re: [RFC] cpuidle : Add debugfs support for cpuidle core

Hi Rafael,


On 12/19/2019 02:34 PM, Rafael J. Wysocki wrote:
> On Wed, Dec 18, 2019 at 3:26 PM Oliver O'Halloran <oohall@...il.com> wrote:
>> On Wed, Dec 18, 2019 at 3:51 AM Rafael J. Wysocki <rafael@...nel.org> wrote:
>>> On Tue, Dec 17, 2019 at 3:42 PM Abhishek Goel
>>> <huntbag@...ux.vnet.ibm.com> wrote:
>>>> Up until now, we did not have a way to tune cpuidle attribute like
>>>> residency in kernel. This patch adds support for debugfs in cpuidle core.
>>>> Thereby providing support for tuning cpuidle attributes like residency in
>>>> kernel at runtime.
>>> This is not a good idea in my view, for a couple of reasons.
>>>
>>> First off, if the target residency of an idle state is changed, it
>>> effectively becomes a different one and all of the statistics
>>> regarding it become outdated at that point.  Synchronizing that would
>>> be a pain.
>>>
>>> Next, governors may get confused if idle state parameters are changed
>>> on the fly.  In particular, the statistics collected by the teo
>>> governor depend on the target residencies of idle states, so if one of
>>> them changes, the governor needs to be reloaded.
>>>
>>> Next, idle states are expected to be ordered by the target residency
>>> (and by the exit latency), so their parameters cannot be allowed to
>>> change freely anyway.
>>>
>>> Finally, the idle state parameters are expected to reflect the
>>> properties of the hardware, which wouldn't hold any more if they were
>>> allowed to change at any time.
>> Certainly does sound like a headache.
>>
>>>> For example: Tuning residency at runtime can be used to quantify governors
>>>> decision making as governor uses residency as one of the parameter to
>>>> take decision about the state that needs to be entered while idling.
>>> IMO it would be better to introduce a testing cpuidle driver with an
>>> artificial set of idle states (or even such that the set of idle
>>> states to be used by it can be defined by the user e.g. via module
>>> parameters) for this purpose.
>> The motivation for this patch isn't really a desire to test / tune the
>> governor. It's intended to allow working around a performance problem
>> caused by using high-latency idle states on some interrupt heavy GPU
>> workload. The interrupts occur around ~30ms apart which is long enough
>> for the governor to put the CPU into the deeper states and over the
>> course of long job the additional wakeup latency adds up. The initial
>> fix someone came up with was cooking the residency values so the
>> high-latency states had a residency of +50ms to prevent the govenor
>> from using them. However, that fix is supposed to go into a bit of
>> firmware I maintain and I'm not terribly happy with the idea. I'm
>> fairly sure that ~30ms value is workload dependent and personally I
>> don't think firmware should be making up numbers to trick specific
>> kernel versions into doing specific things.
>>
>> My impression is the right solution is to have the GPU driver set a PM
>> QoS constraint on the CPUs receiving interrupts while a job is
>> on-going.
> Yes, that would address the GPU problem.
>
>> However, interrupt latency sensitivity isn't something
>> that's unique to GPUs so I'm wondering it it makes sense to have the
>> governor factor in interrupt traffic when deciding what state to use.
>> Is that something that's been tried before?
> Yes, that is in the works.
>
> The existing governors should take interrupts into account too in the
> form of the expected idle duration corrections, but that may not be
> particularly precise.  If the governor currently in use (I guess menu)
> doesn't to that, you may try an alternative one (e.g. teo).

For this particular case, I tried TEO but it did not solve the issue.
> That said, work is in progress on taking the actual interrupt
> frequency into account in idle duration prediction.
>
> Thanks!

Could you please point out to the patch looking into device interrupt
frequency for cpuidle.

Thanks.