lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0ji=601eHQzHP1KuiA_TRUBaeEL6=sSLR_sW12MS_8QcA@mail.gmail.com>
Date:   Wed, 9 Jun 2021 17:01:42 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Lukasz Luba <lukasz.luba@....com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Quentin Perret <qperret@...gle.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        vincent.donnefort@....com, Beata.Michalska@....com,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>, segall@...gle.com,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: [PATCH v2 2/2] sched/cpufreq: Consider reduced CPU capacity in
 energy calculation

On Fri, Jun 4, 2021 at 10:10 AM Lukasz Luba <lukasz.luba@....com> wrote:
>
> Energy Aware Scheduling (EAS) needs to predict the decisions made by
> SchedUtil. The map_util_freq() exists to do that.
>
> There are corner cases where the max allowed frequency might be reduced
> (due to thermal). SchedUtil as a CPUFreq governor, is aware of that
> but EAS is not. This patch aims to address it.
>
> SchedUtil stores the maximum allowed frequency in
> 'sugov_policy::next_freq' field. EAS has to predict that value, which is
> the real used frequency. That value is made after a call to
> cpufreq_driver_resolve_freq() which clamps to the CPUFreq policy limits.
> In the existing code EAS is not able to predict that real frequency.
> This leads to energy estimation errors.
>
> To avoid wrong energy estimation in EAS (due to frequency miss prediction)
> make sure that the step which calculates Performance Domain frequency,
> is also aware of the allowed CPU capacity.
>
> Furthermore, modify map_util_freq() to not extend the frequency value.
> Instead, use map_util_perf() to extend the util value in both places:
> SchedUtil and EAS, but for EAS clamp it to max allowed CPU capacity.
> In the end, we achieve the same desirable behavior for both subsystems
> and alignment in regards to the real CPU frequency.
>
> Signed-off-by: Lukasz Luba <lukasz.luba@....com>

For the schedutil part

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>

> ---
>  include/linux/energy_model.h     | 16 +++++++++++++---
>  include/linux/sched/cpufreq.h    |  2 +-
>  kernel/sched/cpufreq_schedutil.c |  1 +
>  kernel/sched/fair.c              |  2 +-
>  4 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 757fc60658fa..3f221dbf5f95 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -91,6 +91,8 @@ void em_dev_unregister_perf_domain(struct device *dev);
>   * @pd         : performance domain for which energy has to be estimated
>   * @max_util   : highest utilization among CPUs of the domain
>   * @sum_util   : sum of the utilization of all CPUs in the domain
> + * @allowed_cpu_cap    : maximum allowed CPU capacity for the @pd, which
> +                         might reflect reduced frequency (due to thermal)
>   *
>   * This function must be used only for CPU devices. There is no validation,
>   * i.e. if the EM is a CPU type and has cpumask allocated. It is called from
> @@ -100,7 +102,8 @@ void em_dev_unregister_perf_domain(struct device *dev);
>   * a capacity state satisfying the max utilization of the domain.
>   */
>  static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
> -                               unsigned long max_util, unsigned long sum_util)
> +                               unsigned long max_util, unsigned long sum_util,
> +                               unsigned long allowed_cpu_cap)
>  {
>         unsigned long freq, scale_cpu;
>         struct em_perf_state *ps;
> @@ -112,11 +115,17 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>         /*
>          * In order to predict the performance state, map the utilization of
>          * the most utilized CPU of the performance domain to a requested
> -        * frequency, like schedutil.
> +        * frequency, like schedutil. Take also into account that the real
> +        * frequency might be set lower (due to thermal capping). Thus, clamp
> +        * max utilization to the allowed CPU capacity before calculating
> +        * effective frequency.
>          */
>         cpu = cpumask_first(to_cpumask(pd->cpus));
>         scale_cpu = arch_scale_cpu_capacity(cpu);
>         ps = &pd->table[pd->nr_perf_states - 1];
> +
> +       max_util = map_util_perf(max_util);
> +       max_util = min(max_util, allowed_cpu_cap);
>         freq = map_util_freq(max_util, ps->frequency, scale_cpu);
>
>         /*
> @@ -209,7 +218,8 @@ static inline struct em_perf_domain *em_pd_get(struct device *dev)
>         return NULL;
>  }
>  static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
> -                       unsigned long max_util, unsigned long sum_util)
> +                       unsigned long max_util, unsigned long sum_util,
> +                       unsigned long allowed_cpu_cap)
>  {
>         return 0;
>  }
> diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
> index 6205578ab6ee..bdd31ab93bc5 100644
> --- a/include/linux/sched/cpufreq.h
> +++ b/include/linux/sched/cpufreq.h
> @@ -26,7 +26,7 @@ bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy);
>  static inline unsigned long map_util_freq(unsigned long util,
>                                         unsigned long freq, unsigned long cap)
>  {
> -       return (freq + (freq >> 2)) * util / cap;
> +       return freq * util / cap;
>  }
>
>  static inline unsigned long map_util_perf(unsigned long util)
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 4f09afd2f321..57124614363d 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -151,6 +151,7 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>         unsigned int freq = arch_scale_freq_invariant() ?
>                                 policy->cpuinfo.max_freq : policy->cur;
>
> +       util = map_util_perf(util);
>         freq = map_util_freq(util, freq, max);
>
>         if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1aeddecabc20..9a79bbd9425b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6590,7 +6590,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
>                 max_util = max(max_util, min(cpu_util, _cpu_cap));
>         }
>
> -       return em_cpu_energy(pd->em_pd, max_util, sum_util);
> +       return em_cpu_energy(pd->em_pd, max_util, sum_util, _cpu_cap);
>  }
>
>  /*
> --
> 2.17.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ