linux-kernel - Re: [PATCH v3 2/3] sched/fair: Take thermal pressure into account while estimating energy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAq5Hn7iQ-USO5La4B_jkYXzSvFSFrCDq47gjXDGghyTQ@mail.gmail.com>
Date:   Mon, 14 Jun 2021 18:03:34 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Lukasz Luba <lukasz.luba@....com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        "open list:THERMAL" <linux-pm@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Quentin Perret <qperret@...gle.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Vincent Donnefort <vincent.donnefort@....com>,
        Beata Michalska <Beata.Michalska@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>, segall@...gle.com,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Thara Gopinath <thara.gopinath@...aro.org>,
        Amit Kachhap <amit.kachhap@...il.com>, amitk@...nel.org,
        Zhang Rui <rui.zhang@...el.com>,
        Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [PATCH v3 2/3] sched/fair: Take thermal pressure into account
 while estimating energy

On Thu, 10 Jun 2021 at 17:03, Lukasz Luba <lukasz.luba@....com> wrote:
>
> Energy Aware Scheduling (EAS) needs to be able to predict the frequency
> requests made by the SchedUtil governor to properly estimate energy used
> in the future. It has to take into account CPUs utilization and forecast
> Performance Domain (PD) frequency. There is a corner case when the max
> allowed frequency might be reduced due to thermal. SchedUtil is aware of
> that reduced frequency, so it should be taken into account also in EAS
> estimations.
>
> SchedUtil, as a CPUFreq governor, knows the maximum allowed frequency of
> a CPU, thanks to cpufreq_driver_resolve_freq() and internal clamping
> to 'policy::max'. SchedUtil is responsible to respect that upper limit
> while setting the frequency through CPUFreq drivers. This effective
> frequency is stored internally in 'sugov_policy::next_freq' and EAS has
> to predict that value.
>
> In the existing code the raw value of arch_scale_cpu_capacity() is used
> for clamping the returned CPU utilization from effective_cpu_util().
> This patch fixes issue with too big single CPU utilization, by introducing
> clamping to the allowed CPU capacity. The allowed CPU capacity is a CPU
> capacity reduced by thermal pressure signal. We rely on this load avg

you don't rely on load avg value but on raw thermal pressure value now

> geometric series in similar way as other mechanisms in the scheduler.
>
> Thanks to knowledge about allowed CPU capacity, we don't get too big value
> for a single CPU utilization, which is then added to the util sum. The
> util sum is used as a source of information for estimating whole PD energy.
> To avoid wrong energy estimation in EAS (due to capped frequency), make
> sure that the calculation of util sum is aware of allowed CPU capacity.
>
> This thermal pressure might be visible in scenarios where the CPUs are not
> heavily loaded, but some other component (like GPU) drastically reduced
> available power budget and increased the SoC temperature. Thus, we still
> use EAS for task placement and CPUs are not over-utilized.
>
> Signed-off-by: Lukasz Luba <lukasz.luba@....com>
> ---
>  kernel/sched/fair.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 161b92aa1c79..237726217dad 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6527,8 +6527,12 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
>         struct cpumask *pd_mask = perf_domain_span(pd);
>         unsigned long cpu_cap = arch_scale_cpu_capacity(cpumask_first(pd_mask));
>         unsigned long max_util = 0, sum_util = 0;
> +       unsigned long _cpu_cap, thermal_pressure;
>         int cpu;
>
> +       thermal_pressure = arch_scale_thermal_pressure(cpumask_first(pd_mask));

Do you really need to use this intermediate variable thermal_pressure
? Seems to be used only below

With these 2 comments above fixed,

Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>

> +       _cpu_cap = cpu_cap - thermal_pressure;
> +
>         /*
>          * The capacity state of CPUs of the current rd can be driven by CPUs
>          * of another rd if they belong to the same pd. So, account for the
> @@ -6564,8 +6568,10 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
>                  * is already enough to scale the EM reported power
>                  * consumption at the (eventually clamped) cpu_capacity.
>                  */
> -               sum_util += effective_cpu_util(cpu, util_running, cpu_cap,
> -                                              ENERGY_UTIL, NULL);
> +               cpu_util = effective_cpu_util(cpu, util_running, cpu_cap,
> +                                             ENERGY_UTIL, NULL);
> +
> +               sum_util += min(cpu_util, _cpu_cap);
>
>                 /*
>                  * Performance domain frequency: utilization clamping
> @@ -6576,7 +6582,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
>                  */
>                 cpu_util = effective_cpu_util(cpu, util_freq, cpu_cap,
>                                               FREQUENCY_UTIL, tsk);
> -               max_util = max(max_util, cpu_util);
> +               max_util = max(max_util, min(cpu_util, _cpu_cap));
>         }
>
>         return em_cpu_energy(pd->em_pd, max_util, sum_util);
> --
> 2.17.1
>