[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f40522de-b71d-4848-8aa3-5b87d38bb847@arm.com>
Date: Mon, 23 Oct 2023 16:11:13 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Swapnil Sapkal <Swapnil.Sapkal@....com>,
Aaron Lu <aaron.lu@...el.com>, Chen Yu <yu.c.chen@...el.com>,
Tim Chen <tim.c.chen@...el.com>,
K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>, x86@...nel.org
Subject: Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY
feature (v2)
On 19/10/2023 18:05, Mathieu Desnoyers wrote:
> Introduce the UTIL_FITS_CAPACITY scheduler feature. The runqueue
> selection picks the previous, target, or recent runqueues if they have
> enough remaining capacity to enqueue the task before scanning for an
> idle cpu.
>
> This feature is introduced in preparation for the SELECT_BIAS_PREV
> scheduler feature.
>
> The following benchmarks only cover the UTIL_FITS_CAPACITY feature.
> Those are performed on a v6.5.5 kernel with mitigations=off.
>
> The following hackbench workload on a 192 cores AMD EPYC 9654 96-Core
> Processor (over 2 sockets) improves the wall time from 49s to 40s
> (18% speedup).
>
> hackbench -g 32 -f 20 --threads --pipe -l 480000 -s 100
>
> We can observe that the number of migrations is reduced significantly
> with this patch (improvement):
>
> Baseline: 117M cpu-migrations (9.355 K/sec)
> With patch: 47M cpu-migrations (3.977 K/sec)
>
> The task-clock utilization is increased (improvement):
>
> Baseline: 253.275 CPUs utilized
> With patch: 271.367 CPUs utilized
>
> The number of context-switches is increased (degradation):
>
> Baseline: 445M context-switches (35.516 K/sec)
> With patch: 586M context-switches (48.823 K/sec)
>
Haven't run any benchmarks yet to prove the benefit of this prefer
packing over spreading or migration avoidance algorithm.
[...]
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4497,6 +4497,28 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
> trace_sched_util_est_se_tp(&p->se);
> }
>
> +static unsigned long scale_rt_capacity(int cpu);
> +
> +/*
> + * Returns true if adding the task utilization to the estimated
> + * utilization of the runnable tasks on @cpu does not exceed the
> + * capacity of @cpu.
> + *
> + * This considers only the utilization of _runnable_ tasks on the @cpu
> + * runqueue, excluding blocked and sleeping tasks. This is achieved by
> + * using the runqueue util_est.enqueued.
> + */
> +static inline bool task_fits_remaining_cpu_capacity(unsigned long task_util,
> + int cpu)
This is almost like the existing task_fits_cpu(p, cpu) (used in Capacity
Aware Scheduling (CAS) for Asymmetric CPU capacity systems) except the
latter only uses `util = task_util_est(p)` and deals with uclamp as well
and only tests whether p could fit on the CPU.
Or like find_energy_efficient_cpu() (feec(), used in
Energy-Aware-Scheduling (EAS)) which uses cpu_util(cpu, p, cpu, 0) to get:
max(util_avg(CPU + p), util_est(CPU + p))
feec()
...
for (; pd; pd = pd->next)
...
util = cpu_util(cpu, p, cpu, 0);
...
fits = util_fits_cpu(util, util_min, util_max, cpu)
^^^^^^^^^^^^^^^^^^
not used when uclamp is not active (1)
...
capacity = capacity_of(cpu)
fits = fits_capacity(util, capacity)
if (!uclamp_is_used()) (1)
return fits
So not introducing new functions like task_fits_remaining_cpu_capacity()
in this area and using existing one would be good.
> +{
> + unsigned long total_util;
> +
> + if (!sched_util_fits_capacity_active())
> + return false;
> + total_util = READ_ONCE(cpu_rq(cpu)->cfs.avg.util_est.enqueued) + task_util;
> + return fits_capacity(total_util, scale_rt_capacity(cpu));
Why not use:
static unsigned long capacity_of(int cpu)
return cpu_rq(cpu)->cpu_capacity;
which is maintained in update_cpu_capacity() as scale_rt_capacity(cpu)?
[...]
> @@ -7173,7 +7200,8 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> if (recent_used_cpu != prev &&
> recent_used_cpu != target &&
> cpus_share_cache(recent_used_cpu, target) &&
> - (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
> + (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu) ||
> + task_fits_remaining_cpu_capacity(task_util, recent_used_cpu)) &&
> cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) &&
> asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) {
> return recent_used_cpu;
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index ee7f23c76bd3..9a84a1401123 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -97,6 +97,12 @@ SCHED_FEAT(WA_BIAS, true)
> SCHED_FEAT(UTIL_EST, true)
> SCHED_FEAT(UTIL_EST_FASTUP, true)
IMHO, asymmetric CPU capacity systems would have to disable the sched
feature UTIL_FITS_CAPACITY. Otherwise CAS could deliver different
results. task_fits_remaining_cpu_capacity() and asym_fits_cpu() work
slightly different.
[...]
Powered by blists - more mailing lists