linux-kernel - Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c79ac631-61c7-4953-a657-74047a264029@efficios.com>
Date:   Mon, 23 Oct 2023 11:04:49 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Swapnil Sapkal <Swapnil.Sapkal@....com>,
        Aaron Lu <aaron.lu@...el.com>, Chen Yu <yu.c.chen@...el.com>,
        Tim Chen <tim.c.chen@...el.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>, x86@...nel.org
Subject: Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY
 feature (v2)

On 2023-10-23 10:11, Dietmar Eggemann wrote:
> On 19/10/2023 18:05, Mathieu Desnoyers wrote:

[...]
>>   
>> +static unsigned long scale_rt_capacity(int cpu);
>> +
>> +/*
>> + * Returns true if adding the task utilization to the estimated
>> + * utilization of the runnable tasks on @cpu does not exceed the
>> + * capacity of @cpu.
>> + *
>> + * This considers only the utilization of _runnable_ tasks on the @cpu
>> + * runqueue, excluding blocked and sleeping tasks. This is achieved by
>> + * using the runqueue util_est.enqueued.
>> + */
>> +static inline bool task_fits_remaining_cpu_capacity(unsigned long task_util,
>> +						    int cpu)
> 
> This is almost like the existing task_fits_cpu(p, cpu) (used in Capacity
> Aware Scheduling (CAS) for Asymmetric CPU capacity systems) except the
> latter only uses `util = task_util_est(p)` and deals with uclamp as well
> and only tests whether p could fit on the CPU.

This is indeed a major difference between how asym capacity check works 
and what is introduced here:

asym capacity check only checks whether the given task theoretically 
fits in the cpu if that cpu was completely idle, without considering the 
current cpu utilization.

My approach is to consider the current util_est of the cpu to check 
whether the task fits in the remaining capacity.

I did not want to use the existing task_fits_cpu() helper because the 
notions of uclamp bounds appear to be heavily tied to the fact that it 
checks whether the task fits in an _idle_ runqueue, whereas the check I 
am introducing here is much more restrictive: it checks that the task 
fits on the runqueue within the remaining capacity.

> 
> Or like find_energy_efficient_cpu() (feec(), used in
> Energy-Aware-Scheduling (EAS)) which uses cpu_util(cpu, p, cpu, 0) to get:
> 
>    max(util_avg(CPU + p), util_est(CPU + p))

I've tried using cpu_util(), but unfortunately anything that considers 
blocked/sleeping tasks in its utilization total does not work for my 
use-case.

 From cpu_util():

  * CPU utilization is the sum of running time of runnable tasks plus the
  * recent utilization of currently non-runnable tasks on that CPU.

> 
> feec()
>      ...
>      for (; pd; pd = pd->next)
>          ...
>          util = cpu_util(cpu, p, cpu, 0);
>          ...
>          fits = util_fits_cpu(util, util_min, util_max, cpu)
>                                     ^^^^^^^^^^^^^^^^^^
>                                    not used when uclamp is not active (1)
>              ...
>              capacity = capacity_of(cpu)
>              fits = fits_capacity(util, capacity)
>              if (!uclamp_is_used()) (1)
>                  return fits
> 
> So not introducing new functions like task_fits_remaining_cpu_capacity()
> in this area and using existing one would be good.

If the notion of uclamp is not tied to the way asym capacity check is 
done against a theoretically idle runqueue, I'd be OK with using this, 
but so far both appear to be very much tied.

When I stumbled on this fundamental difference between asym cpu capacity 
check and the check introduced here, I've started wondering whether the 
asym cpu capacity check would benefit from considering the target cpu 
current utilization as well.

> 
>> +{
>> +	unsigned long total_util;
>> +
>> +	if (!sched_util_fits_capacity_active())
>> +		return false;
>> +	total_util = READ_ONCE(cpu_rq(cpu)->cfs.avg.util_est.enqueued) + task_util;
>> +	return fits_capacity(total_util, scale_rt_capacity(cpu));
> 
> Why not use:
> 
> static unsigned long capacity_of(int cpu)
>      return cpu_rq(cpu)->cpu_capacity;
> 
> which is maintained in update_cpu_capacity() as scale_rt_capacity(cpu)?

The reason for preferring scale_rt_capacity(cpu) over capacity_of(cpu) 
is that update_cpu_capacity() only runs periodically every 
balance-interval, therefore providing a coarse-grained remaining 
capacity approximation with respect to irq, rt, dl, and thermal utilization.

If it turns out that being coarse-grained is good enough, we may be able 
to save some cycles by using capacity_of(), but not without carefully 
considering the impacts of being imprecise.

> 
> [...]
> 
>> @@ -7173,7 +7200,8 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>>   	if (recent_used_cpu != prev &&
>>   	    recent_used_cpu != target &&
>>   	    cpus_share_cache(recent_used_cpu, target) &&
>> -	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
>> +	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu) ||
>> +	    task_fits_remaining_cpu_capacity(task_util, recent_used_cpu)) &&
>>   	    cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) &&
>>   	    asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) {
>>   		return recent_used_cpu;
>> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
>> index ee7f23c76bd3..9a84a1401123 100644
>> --- a/kernel/sched/features.h
>> +++ b/kernel/sched/features.h
>> @@ -97,6 +97,12 @@ SCHED_FEAT(WA_BIAS, true)
>>   SCHED_FEAT(UTIL_EST, true)
>>   SCHED_FEAT(UTIL_EST_FASTUP, true)
> 
> IMHO, asymmetric CPU capacity systems would have to disable the sched
> feature UTIL_FITS_CAPACITY. Otherwise CAS could deliver different
> results. task_fits_remaining_cpu_capacity() and asym_fits_cpu() work
> slightly different.

I don't think they should be mutually exclusive. We should look into the 
differences between those two more closely to make them work nicely 
together instead. For instance, why does asym capacity only consider 
whether tasks fit in a theoretically idle runqueue, when it could use 
the current utilization of the runqueue to check that the task fits in 
the remaining capacity ?

Unfortunately I don't have a machine with asym cpu to test locally.

Thanks for your feedback !

Mathieu


> 
> [...]
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com