linux-kernel - Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1827227.Nyzl5ssJXb@aspire.rjw.lan>
Date:   Tue, 09 May 2017 01:01:43 +0200
From:   "Rafael J. Wysocki" <rjw@...ysocki.net>
To:     Wanpeng Li <kernellwp@...il.com>
Cc:     Viresh Kumar <viresh.kumar@...aro.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Juri Lelli <juri.lelli@....com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Patrick Bellasi <patrick.bellasi@....com>,
        Joel Fernandes <joelaf@...gle.com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely

On Tuesday, May 09, 2017 06:36:14 AM Wanpeng Li wrote:
> 2017-05-09 6:16 GMT+08:00 Rafael J. Wysocki <rjw@...ysocki.net>:
> > On Monday, May 08, 2017 09:31:19 AM Viresh Kumar wrote:
> >> On 08-05-17, 11:49, Wanpeng Li wrote:
> >> > Hi Rafael,
> >> > 2017-03-22 7:08 GMT+08:00 Rafael J. Wysocki <rjw@...ysocki.net>:
> >> > > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >> > >
> >> > > The way the schedutil governor uses the PELT metric causes it to
> >> > > underestimate the CPU utilization in some cases.
> >> > >
> >> > > That can be easily demonstrated by running kernel compilation on
> >> > > a Sandy Bridge Intel processor, running turbostat in parallel with
> >> > > it and looking at the values written to the MSR_IA32_PERF_CTL
> >> > > register.  Namely, the expected result would be that when all CPUs
> >> > > were 100% busy, all of them would be requested to run in the maximum
> >> > > P-state, but observation shows that this clearly isn't the case.
> >> > > The CPUs run in the maximum P-state for a while and then are
> >> > > requested to run slower and go back to the maximum P-state after
> >> > > a while again.  That causes the actual frequency of the processor to
> >> > > visibly oscillate below the sustainable maximum in a jittery fashion
> >> > > which clearly is not desirable.
> >> > >
> >> > > That has been attributed to CPU utilization metric updates on task
> >> > > migration that cause the total utilization value for the CPU to be
> >> > > reduced by the utilization of the migrated task.  If that happens,
> >> > > the schedutil governor may see a CPU utilization reduction and will
> >> > > attempt to reduce the CPU frequency accordingly right away.  That
> >> > > may be premature, though, for example if the system is generally
> >> > > busy and there are other runnable tasks waiting to be run on that
> >> > > CPU already.
> >> > >
> >> > > This is unlikely to be an issue on systems where cpufreq policies are
> >> > > shared between multiple CPUs, because in those cases the policy
> >> > > utilization is computed as the maximum of the CPU utilization values
> >> >
> >> > Sorry for one question maybe not associated with this patch. If the
> >> > cpufreq policy is shared between multiple CPUs, the function
> >> > intel_cpufreq_target()  just updates IA32_PERF_CTL MSR of the cpu
> >> > which is managing this policy, I wonder whether other cpus which are
> >> > affected should also update their per-logical cpu's IA32_PERF_CTL MSR?
> >>
> >> The CPUs share the policy when they share their freq/voltage rails and so
> >> changing perf state of one CPU should result in that changing for all the CPUs
> >> in that policy. Otherwise, they can't be considered to be part of the same
> >> policy.
> >
> > To be entirely precise, this depends on the granularity of the HW interface.
> >
> > If the interface is per-logical-CPU, we will use it this way for efficiency
> > reasons and even if there is some coordination on the HW side, the information
> > on how exactly it works usually is limited.
> 
> I check it on several Xeon servers on hand, however, I didn't find
> /sys/devices/system/cpu/cpufreq/policyx/affected_cpus can affect more
> than one logical cpu, so I guess most of Xeon servers are not support
> shared cpufreq policy, then which kind of boxes support that?

On Intel the interface for performance scaling is per-logical-CPU in general.

Thanks,
Rafael