lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200716115605.GR10769@hirez.programming.kicks-ass.net>
Date:   Thu, 16 Jul 2020 13:56:05 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Viresh Kumar <viresh.kumar@...aro.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Zhang Rui <rui.zhang@...el.com>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Amit Daniel Kachhap <amit.kachhap@...il.com>,
        Javi Merino <javi.merino@...nel.org>,
        Amit Kucheria <amit.kucheria@...durent.com>,
        linux-kernel@...r.kernel.org, Quentin Perret <qperret@...gle.com>,
        Rafael Wysocki <rjw@...ysocki.net>, linux-pm@...r.kernel.org,
        lukasz.luba@....com
Subject: Re: [PATCH 2/2] thermal: cpufreq_cooling: Reuse effective_cpu_util()

On Tue, Jul 14, 2020 at 12:06:53PM +0530, Viresh Kumar wrote:
>  /**
> + * get_load() - get current load for a cpu
>   * @cpufreq_cdev:	&struct cpufreq_cooling_device for this cpu
>   * @cpu:	cpu number
> + * @cpu_idx:	index of the cpu
>   *
> + * Return: The current load of cpu @cpu in percentage.
>   */
>  static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
>  		    int cpu_idx)
>  {
> +	unsigned long util = cpu_util_cfs(cpu_rq(cpu));
> +	unsigned long max = arch_scale_cpu_capacity(cpu);
>  
> +	util = effective_cpu_util(cpu, util, max, ENERGY_UTIL, NULL);
> +	return (util * 100) / max;
>  }

So there's a number of things... let me recap a bunch of things that
got mentioned on IRC earlier this week and then continue from there..

So IPA* (or any other thermal governor) needs energy estimates for the
various managed devices, cpufreq_cooling, being the driver for the CPU
device, needs to provide that and in return receives feedback on how
much energy it is allowed to consume, cpufreq_cooling then dynamically
enables/disables OPP states.

There are actually two methods the thermal governor will use:
get_real_power() and get_requested_power().

The first isn't used anywhere in mainline, but could be implemented on
hardware that has energy counters (like say x86 RAPL).

The second attempts to guesstimate power, and is the subject of this
patch.

Currently cpufreq_cooling appears to estimate the CPU energy usage by
calculating the percentage of idle time using the per-cpu cpustat stuff,
which is pretty horrific.

This patch then attempts to improve upon that by using the scheduler's
cpu_util(ENERGY_UTIL) estimate, which is also used to select OPP state
and improves upon avg idle. This should be a big improvement as higher
frequency consumes more energy, but should we not also consider that:

	E = C V^2 f

The EAS energy model has tables for the OPPs that contain this, but in
this case we seem to be assuming a linear enery/frequency curve, which
is just not the case.

I suppose we could do something like **:

	100 * util^3 / max^3

which assumes V~f.

Another point is that cpu_util() vs turbo is a bit iffy, and to that,
things like x86-APERF/MPERF and ARM-AMU got mentioned. Those might also
have the benefit of giving you values that match your own sampling
interval (100ms), where the sched stuff is PELT (64,32.. based).

So what I've been thinking is that cpufreq drivers ought to be able to
supply this method, and only when they lack, can the cpufreq-governor
(schedutil) install a fallback. And then cpufreq-cooling can use
whatever is provided (through the cpufreq interfaces).

That way, we:

 1) don't have to export anything
 2) get arch drivers to provide something 'better'


Does that sounds like something sensible?




[*] I always want a beer when I see that name :-)

[**] I despise code that uses percentages, computers suck at
/100 and there is no reason not to use any other random fraction, so why
pick a bad one.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ