linux-kernel - Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 24 Mar 2017 18:14:34 -0700
From:   Sai Gurrappadi <sgurrappadi@...dia.com>
To:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux PM <linux-pm@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>
CC:     LKML <linux-kernel@...r.kernel.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Juri Lelli <juri.lelli@....com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Patrick Bellasi <patrick.bellasi@....com>,
        Joel Fernandes <joelaf@...gle.com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Boonstoppel <pboonstoppel@...dia.com>
Subject: Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency
 of busy CPUs prematurely

Hi Rafael,

On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> 
> The way the schedutil governor uses the PELT metric causes it to
> underestimate the CPU utilization in some cases.
> 
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register.  Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again.  That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
> 
> That has been attributed to CPU utilization metric updates on task
> migration that cause the total utilization value for the CPU to be
> reduced by the utilization of the migrated task.  If that happens,
> the schedutil governor may see a CPU utilization reduction and will
> attempt to reduce the CPU frequency accordingly right away.  That
> may be premature, though, for example if the system is generally
> busy and there are other runnable tasks waiting to be run on that
> CPU already.
> 

Thinking out loud a bit, I wonder if what you really want to do is basically:

schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, total_cpu_util_avg);

Where total_cpu_util_avg tracks the average utilization of the CPU itself over time (% of time the CPU was busy) in the same PELT like manner. The difference here is that it doesn't change instantaneously as tasks migrate in/out but it decays/accumulates just like the per-entity util_avgs.

Over time, total_cpu_util_avg and cfs_rq(cpu)->util_avg will tend towards each other the lesser the amount of 'overlap' / overloading.

Yes, the above metric would 'overestimate' in case all tasks have migrated away and we are left with an idle CPU. A fix for that could be to just use the PELT value like so:

schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, idle_cpu(cpu) ? 0 : total_cpu_util_avg);

Note that the problem described here in the commit message doesn't need fully runnable threads, it just needs two threads to execute in parallel on the same CPU for a period of time. I don't think looking at just idle_calls necessarily covers all cases.

Thoughts?

Thanks,
-Sai