[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <908bb624-1778-4f57-a89b-503a4076cb2e@arm.com>
Date: Tue, 24 Oct 2023 16:10:01 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Swapnil Sapkal <Swapnil.Sapkal@....com>,
Aaron Lu <aaron.lu@...el.com>, Chen Yu <yu.c.chen@...el.com>,
Tim Chen <tim.c.chen@...el.com>,
K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>, x86@...nel.org
Subject: Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY
feature (v2)
On 23/10/2023 17:04, Mathieu Desnoyers wrote:
> On 2023-10-23 10:11, Dietmar Eggemann wrote:
>> On 19/10/2023 18:05, Mathieu Desnoyers wrote:
>
> [...]
>>> +static unsigned long scale_rt_capacity(int cpu);
>>> +
>>> +/*
>>> + * Returns true if adding the task utilization to the estimated
>>> + * utilization of the runnable tasks on @cpu does not exceed the
>>> + * capacity of @cpu.
>>> + *
>>> + * This considers only the utilization of _runnable_ tasks on the @cpu
>>> + * runqueue, excluding blocked and sleeping tasks. This is achieved by
>>> + * using the runqueue util_est.enqueued.
>>> + */
>>> +static inline bool task_fits_remaining_cpu_capacity(unsigned long
>>> task_util,
>>> + int cpu)
>>
>> This is almost like the existing task_fits_cpu(p, cpu) (used in Capacity
>> Aware Scheduling (CAS) for Asymmetric CPU capacity systems) except the
>> latter only uses `util = task_util_est(p)` and deals with uclamp as well
>> and only tests whether p could fit on the CPU.
>
> This is indeed a major difference between how asym capacity check works
> and what is introduced here:
>
> asym capacity check only checks whether the given task theoretically
> fits in the cpu if that cpu was completely idle, without considering the
> current cpu utilization.
Yeah, asymmetric CPU capacity systems have to make sure that p fits on
the idle/sched_idle CPU, hence the use of sync_entity_load_avg() and
asym_fits_cpu().
> My approach is to consider the current util_est of the cpu to check
> whether the task fits in the remaining capacity.
True.
> I did not want to use the existing task_fits_cpu() helper because the
> notions of uclamp bounds appear to be heavily tied to the fact that it
> checks whether the task fits in an _idle_ runqueue, whereas the check I
> am introducing here is much more restrictive: it checks that the task
> fits on the runqueue within the remaining capacity.
I see. Essentially what you do is
util_fits_cpu(util_est(CPU + p), 0, 1024, CPU) in !uclamp_is_used()
The uclamp_is_used() case is task-centric though. (*)
>> Or like find_energy_efficient_cpu() (feec(), used in
>> Energy-Aware-Scheduling (EAS)) which uses cpu_util(cpu, p, cpu, 0) to
>> get:
>>
>> max(util_avg(CPU + p), util_est(CPU + p))
>
> I've tried using cpu_util(), but unfortunately anything that considers
> blocked/sleeping tasks in its utilization total does not work for my
> use-case.
>
> From cpu_util():
>
> * CPU utilization is the sum of running time of runnable tasks plus the
> * recent utilization of currently non-runnable tasks on that CPU.
OK, I see. Occasions in which `util_avg(CPU + p) > util_est(CPU + p)`
would ruin it for your use-case.
>> feec()
>> ...
>> for (; pd; pd = pd->next)
>> ...
>> util = cpu_util(cpu, p, cpu, 0);
>> ...
>> fits = util_fits_cpu(util, util_min, util_max, cpu)
>> ^^^^^^^^^^^^^^^^^^
>> not used when uclamp is not active (1)
>> ...
>> capacity = capacity_of(cpu)
>> fits = fits_capacity(util, capacity)
>> if (!uclamp_is_used()) (1)
>> return fits
>>
>> So not introducing new functions like task_fits_remaining_cpu_capacity()
>> in this area and using existing one would be good.
>
> If the notion of uclamp is not tied to the way asym capacity check is
> done against a theoretically idle runqueue, I'd be OK with using this,
> but so far both appear to be very much tied.
Yeah, uclamp_is_used() scenarios are more complicated (see *).
> When I stumbled on this fundamental difference between asym cpu capacity
> check and the check introduced here, I've started wondering whether the
> asym cpu capacity check would benefit from considering the target cpu
> current utilization as well.
We just adapted select_idle_sibling() for asymmetric CPU capacity
systems by adding the asym_fits_cpu() to the idle/sched_idle check.
For me so far sis() is all about finding an idle CPU and not task packing.
>>> +{
>>> + unsigned long total_util;
>>> +
>>> + if (!sched_util_fits_capacity_active())
>>> + return false;
>>> + total_util = READ_ONCE(cpu_rq(cpu)->cfs.avg.util_est.enqueued) +
>>> task_util;
>>> + return fits_capacity(total_util, scale_rt_capacity(cpu));
>>
>> Why not use:
>>
>> static unsigned long capacity_of(int cpu)
>> return cpu_rq(cpu)->cpu_capacity;
>>
>> which is maintained in update_cpu_capacity() as scale_rt_capacity(cpu)?
>
> The reason for preferring scale_rt_capacity(cpu) over capacity_of(cpu)
> is that update_cpu_capacity() only runs periodically every
> balance-interval, therefore providing a coarse-grained remaining
> capacity approximation with respect to irq, rt, dl, and thermal
> utilization.
>> If it turns out that being coarse-grained is good enough, we may be able
> to save some cycles by using capacity_of(), but not without carefully
> considering the impacts of being imprecise.
OK, I see. We normally consider capacity_of(cpu) as accurate enough.
[...]
>>> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
>>> index ee7f23c76bd3..9a84a1401123 100644
>>> --- a/kernel/sched/features.h
>>> +++ b/kernel/sched/features.h
>>> @@ -97,6 +97,12 @@ SCHED_FEAT(WA_BIAS, true)
>>> SCHED_FEAT(UTIL_EST, true)
>>> SCHED_FEAT(UTIL_EST_FASTUP, true)
>>
>> IMHO, asymmetric CPU capacity systems would have to disable the sched
>> feature UTIL_FITS_CAPACITY. Otherwise CAS could deliver different
>> results. task_fits_remaining_cpu_capacity() and asym_fits_cpu() work
>> slightly different.
>
> I don't think they should be mutually exclusive. We should look into the
> differences between those two more closely to make them work nicely
> together instead. For instance, why does asym capacity only consider
> whether tasks fit in a theoretically idle runqueue, when it could use
> the current utilization of the runqueue to check that the task fits in
> the remaining capacity ?
We have EAS (feec()) for this on asymmetric CPU capacity systems (as our
per-performance_domain packing strategy), which only works when
!overutilized. When overutilized, we just need asym_fits_cpu()
(select_idle_capacity() -> util_fits_cpu()) to select a fitting
idle/sched_idle CPU in CAS which includes the uclamp handling.
[...]
Powered by blists - more mailing lists