[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jhjsg83s616.mognet@arm.com>
Date: Fri, 18 Dec 2020 11:33:09 +0000
From: Valentin Schneider <valentin.schneider@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Vincent Guittot <vincent.guittot@...aro.org>,
Morten Rasmussen <morten.rasmussen@....com>,
dietmar.eggemann@....com, patrick.bellasi@...bug.net,
lenb@...nel.org, linux-kernel@...r.kernel.org,
ionela.voinescu@....com, qperret@...gle.com,
viresh.kumar@...aro.org
Subject: Re: [PATCH] sched: Add schedutil overview
Hi,
Have some more nits below
On 18/12/20 10:32, Peter Zijlstra wrote:
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
> Documentation/scheduler/schedutil.txt | 168 ++++++++++++++++++++++++++++++++++
> 1 file changed, 168 insertions(+)
>
> --- /dev/null
> +++ b/Documentation/scheduler/schedutil.txt
[...]
> +Frequency- / CPU Invariance
> +---------------------------
> +
> +Because consuming the CPU for 50% at 1GHz is not the same as consuming the CPU
> +for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% on
> +a big CPU, we allow architectures to scale the time delta with two ratios, one
> +Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.
> +
> +For simple DVFS architectures (where software is in full control) we trivially
> +compute the ratio as:
> +
> + f_cur
> + r_dvfs := -----
> + f_max
> +
> +For more dynamic systems where the hardware is in control of DVFS (Intel,
> +ARMv8.4-AMU) we use hardware counters to provide us this ratio. For Intel
Nit: To me this reads as if the presence of AMUs entail 'hardware is in
control of DVFS', which doesn't seem right. How about:
For more dynamic systems where the hardware is in control of DVFS we use
hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this
ratio.
> +Schedutil / DVFS
> +----------------
> +
> +Every time the scheduler load tracking is updated (task wakeup, task
> +migration, time progression) we call out to schedutil to update the hardware
> +DVFS state.
> +
> +The basis is the CPU runqueue's 'running' metric, which per the above it is
> +the frequency invariant utilization estimate of the CPU. From this we compute
> +a desired frequency like:
> +
> + max( running, util_est ); if UTIL_EST
> + u_cfs := { running; otherwise
> +
> + u_clamp := clamp( u_cfs, u_min, u_max )
> +
> + u := u_cfs + u_rt + u_irq + u_dl; [approx. see source for more detail]
> +
> + f_des := min( f_max, 1.25 u * f_max )
> +
In schedutil_cpu_util(), uclamp clamps both u_cfs and u_rt. I'm afraid the
below might just bring more confusion; what do you think?
clamp( u_cfs + u_rt, u_min, u_max ); if UCLAMP_TASK
u_clamp := { u_cfs + u_rt; otherwise
u := u_clamp + u_irq + u_dl; [approx. see source for more detail]
(also, does this need a word about runnable rt tasks => goto max?)
> +XXX IO-wait; when the update is due to a task wakeup from IO-completion we
> +boost 'u' above.
> +
> +This frequency is then used to select a P-state/OPP or directly munged into a
> +CPPC style request to the hardware.
> +
> +XXX: deadline tasks (Sporadic Task Model) allows us to calculate a hard f_min
> +required to satisfy the workload.
> +
> +Because these callbacks are directly from the scheduler, the DVFS hardware
> +interaction should be 'fast' and non-blocking. Schedutil supports
> +rate-limiting DVFS requests for when hardware interaction is slow and
> +expensive, this reduces effectiveness.
> +
> +For more information see: kernel/sched/cpufreq_schedutil.c
> +
Powered by blists - more mailing lists