linux-kernel - Re: [RFC/RFT][PATCH v4 1/2] cpufreq: New governor using utilization data from the scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Thu, 3 Mar 2016 20:20:06 +0100
From:	"Rafael J. Wysocki" <rafael@...nel.org>
To:	Steve Muckle <steve.muckle@...aro.org>
Cc:	"Rafael J. Wysocki" <rafael@...nel.org>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Linux PM list <linux-pm@...r.kernel.org>,
	Juri Lelli <juri.lelli@....com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>
Subject: Re: [RFC/RFT][PATCH v4 1/2] cpufreq: New governor using utilization
 data from the scheduler

On Thu, Mar 3, 2016 at 4:20 AM, Steve Muckle <steve.muckle@...aro.org> wrote:
> On 03/01/2016 12:20 PM, Rafael J. Wysocki wrote:
>>> I'm specifically worried about the check below where we omit a CPU's
>>> capacity request if its last update came before the last sample time.
>>>
>>> Say there are 2 CPUs in a frequency domain, HZ is 100 and the sample
>>> delay here is 4ms.
>>
>> Yes, that's the case I clearly didn't take into consideration. :-)
>>
>> My assumption was that the sample delay would always be greater than
>> the typical update rate which of course need not be the case.
>>
>> The reason I added the check at all was that the numbers from the
>> other CPUs may become stale if those CPUs are idle for too long, so at
>> one point the contributions from them need to be discarded.  Question
>> is when that point is and since sample delay may be arbitrary, that
>> mechanism has to be more complex.
>
> Yeah this has been an open issue on our end as well. Sampling-based
> governors of course solved this primarily via their fundamental nature
> and sampling rate. The interactive governor also has a separate tunable
> IIRC which specified how long a CPU may have its sampling timer deferred
> due to idle when running @ > fmin (the "slack timer").
>
> Decoupling the CPU update staleness limit from the freq change rate
> limit via a separate tunable would be valuable IMO. Would you be
> amenable to a patch that did that?

Yes, I would.

It still would be better, though, if that didn't have to be a tunable.

What do you think about my idea to use NSEC_PER_SEC / HZ as the
staleness limit (like in https://patchwork.kernel.org/patch/8477261/)?

[cut]

>> Moreover, since 0 utilization gets you to run in f_min no matter what,
>> if you treat f_max as an absolute, you're going to underutilize the
>> P-states in the upper half of the available range.
>
> Sorry I didn't follow. What do you mean by underutilize the upper half
> of the range? I don't see how using RELATION_L with (util/max) * fmax *
> (headroom) wouldn't be correct in that regard.

Suppose all of the util values from 0 to max are equally probable (or
equally frequent) and the available frequencies are close enough to
each other that it doesn't really matter whether _C or _L is used.

Say f_min is 400 and f_max is 1000.

Then, if you take next_freq = f_max * util / max, 50% of requests will
fall into the 400-500 section of the available frequency range.  Of
course, 40% of them will fall to f_min, but that means that the other
available states will be used less frequently, on the average.

I would prefer that to be more balanced.

Thanks,
Rafael