linux-kernel - Re: [RFC PATCH v4 0/6] sched/cpufreq: Make schedutil energy aware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0hL9AbpgivRGtCtqQo4XRYdt=SDjD=_FAVZmKAi=+VvzA@mail.gmail.com>
Date:   Thu, 23 Jan 2020 16:43:07 +0100
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Douglas RAILLARD <douglas.raillard@....com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        qperret@...gle.com, Linux PM <linux-pm@...r.kernel.org>
Subject: Re: [RFC PATCH v4 0/6] sched/cpufreq: Make schedutil energy aware

On Wed, Jan 22, 2020 at 6:36 PM Douglas RAILLARD
<douglas.raillard@....com> wrote:
>
> Make schedutil cpufreq governor energy-aware.

I have to say that your terminology is confusing to me, like what
exactly does "energy-aware" mean in the first place?

> - patch 1 introduces a function to retrieve a frequency given a base
>   frequency and an energy cost margin.
> - patch 2 links Energy Model perf_domain to sugov_policy.
> - patch 3 updates get_next_freq() to make use of the Energy Model.
> - patch 4 adds sugov_cpu_ramp_boost() function.
> - patch 5 updates sugov_update_(single|shared)() to make use of
>   sugov_cpu_ramp_boost().
> - patch 6 introduces a tracepoint in get_next_freq() for
>   testing/debugging. Since it's not a trace event, it's not exposed to
>   userspace in a directly usable way, allowing for painless future
>   updates/removal.
>
> The benefits of using the EM in schedutil are twofold:

I guess you mean using the EM directly in schedutil (note that it is
used indirectly already, because of EAS), but that needs to be clearly
stated.

> 1) Selecting the highest possible frequency for a given cost. Some
>    platforms can have lower frequencies that are less efficient than
>    higher ones, in which case they should be skipped for most purposes.
>    They can still be useful to give more freedom to thermal throttling
>    mechanisms, but not under normal circumstances.
>    note: the EM framework will warn about such OPPs "hertz/watts ratio
>    non-monotonically decreasing"

While all of that is fair enough for platforms using the EM, do you
realize that the EM is not available on the majority of architectures
(including some fairly significant ones) and so adding overhead
related to it for all of them is quite less than welcome?

> 2) Driving the frequency selection with power in mind, in addition to
>    maximizing the utilization of the non-idle CPUs in the system.

Care to explain this?  I'm totally unsure what you mean here.

> Point 1) is implemented in "PM: Introduce em_pd_get_higher_freq()" and
> enabled in schedutil by
> "sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq()".
>
> Point 2) is enabled in
> "sched/cpufreq: Boost schedutil frequency ramp up". It allows using
> higher frequencies when it is known that the true utilization of
> currently running tasks is exceeding their previous stable point.

Please explain "true utilization" and "stable point".

> The benefits are:
>
> * Boosting the frequency when the behavior of a runnable task changes,
>   leading to an increase in utilization. That shortens the frequency
>   ramp up duration, which in turns allows the utilization signal to
>   reach stable values quicker.  Since the allowed frequency boost is
>   bounded in energy, it will behave consistently across platforms,
>   regardless of the OPP cost range.

Sounds good.

Can you please describe the algorithm applied to achieve that?

> * The boost is only transient, and should not impact a lot the energy
>   consumed of workloads with very stable utilization signals.