linux-kernel - Re: [PATCH] sched: Add schedutil overview

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <jhjsg83s616.mognet@arm.com>
Date:   Fri, 18 Dec 2020 11:33:09 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        dietmar.eggemann@....com, patrick.bellasi@...bug.net,
        lenb@...nel.org, linux-kernel@...r.kernel.org,
        ionela.voinescu@....com, qperret@...gle.com,
        viresh.kumar@...aro.org
Subject: Re: [PATCH] sched: Add schedutil overview


Hi,

Have some more nits below

On 18/12/20 10:32, Peter Zijlstra wrote:
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
>  Documentation/scheduler/schedutil.txt |  168 ++++++++++++++++++++++++++++++++++
>  1 file changed, 168 insertions(+)
>
> --- /dev/null
> +++ b/Documentation/scheduler/schedutil.txt
[...]
> +Frequency- / CPU Invariance
> +---------------------------
> +
> +Because consuming the CPU for 50% at 1GHz is not the same as consuming the CPU
> +for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% on
> +a big CPU, we allow architectures to scale the time delta with two ratios, one
> +Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.
> +
> +For simple DVFS architectures (where software is in full control) we trivially
> +compute the ratio as:
> +
> +	    f_cur
> +  r_dvfs := -----
> +            f_max
> +
> +For more dynamic systems where the hardware is in control of DVFS (Intel,
> +ARMv8.4-AMU) we use hardware counters to provide us this ratio. For Intel

Nit: To me this reads as if the presence of AMUs entail 'hardware is in
control of DVFS', which doesn't seem right. How about:

  For more dynamic systems where the hardware is in control of DVFS we use
  hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this
  ratio.

> +Schedutil / DVFS
> +----------------
> +
> +Every time the scheduler load tracking is updated (task wakeup, task
> +migration, time progression) we call out to schedutil to update the hardware
> +DVFS state.
> +
> +The basis is the CPU runqueue's 'running' metric, which per the above it is
> +the frequency invariant utilization estimate of the CPU. From this we compute
> +a desired frequency like:
> +
> +             max( running, util_est );	if UTIL_EST
> +  u_cfs := { running;			otherwise
> +
> +  u_clamp := clamp( u_cfs, u_min, u_max )
> +
> +  u := u_cfs + u_rt + u_irq + u_dl;	[approx. see source for more detail]
> +
> +  f_des := min( f_max, 1.25 u * f_max )
> +

In schedutil_cpu_util(), uclamp clamps both u_cfs and u_rt. I'm afraid the
below might just bring more confusion; what do you think?

               clamp( u_cfs + u_rt, u_min, u_max );      if UCLAMP_TASK
  u_clamp := { u_cfs + u_rt;                             otherwise

  u := u_clamp + u_irq + u_dl;	    [approx. see source for more detail]

(also, does this need a word about runnable rt tasks => goto max?)

> +XXX IO-wait; when the update is due to a task wakeup from IO-completion we
> +boost 'u' above.
> +
> +This frequency is then used to select a P-state/OPP or directly munged into a
> +CPPC style request to the hardware.
> +
> +XXX: deadline tasks (Sporadic Task Model) allows us to calculate a hard f_min
> +required to satisfy the workload.
> +
> +Because these callbacks are directly from the scheduler, the DVFS hardware
> +interaction should be 'fast' and non-blocking. Schedutil supports
> +rate-limiting DVFS requests for when hardware interaction is slow and
> +expensive, this reduces effectiveness.
> +
> +For more information see: kernel/sched/cpufreq_schedutil.c
> +