linux-kernel - Re: [PATCH 3/5] sched/fair: Rework feec() to use cost instead of spare capacity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <066b7de8-0854-424b-8888-b18fc61ec21c@arm.com>
Date: Mon, 2 Sep 2024 12:03:22 +0100
From: Hongyan Xia <hongyan.xia2@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>, linux-kernel@...r.kernel.org
Cc: qyousef@...alina.io, mingo@...hat.com, peterz@...radead.org,
 juri.lelli@...hat.com, dietmar.eggemann@....com, rostedt@...dmis.org,
 bsegall@...gle.com, vschneid@...hat.com, lukasz.luba@....com,
 mgorman@...e.de, rafael.j.wysocki@...el.com
Subject: Re: [PATCH 3/5] sched/fair: Rework feec() to use cost instead of
 spare capacity

On 30/08/2024 14:03, Vincent Guittot wrote:
> feec() looks for the CPU with highest spare capacity in a PD assuming that
> it will be the best CPU from a energy efficiency PoV because it will
> require the smallest increase of OPP. Although this is true generally
> speaking, this policy also filters some others CPUs which will be as
> efficients because of using the same OPP.
> In fact, we really care about the cost of the new OPP that will be
> selected to handle the waking task. In many cases, several CPUs will end
> up selecting the same OPP and as a result using the same energy cost. In
> these cases, we can use other metrics to select the best CPU for the same
> energy cost.
> 
> Rework feec() to look 1st for the lowest cost in a PD and then the most
> performant CPU between CPUs.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> ---
>   kernel/sched/fair.c | 466 +++++++++++++++++++++++---------------------
>   1 file changed, 244 insertions(+), 222 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e67d6029b269..2273eecf6086 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> [...]
>   
> -	energy = em_cpu_energy(pd->em_pd, max_util, busy_time, eenv->cpu_cap);
> +/* For a same cost, select the CPU that will povide best performance for the task */
> +static bool select_best_cpu(struct energy_cpu_stat *target,
> +			    struct energy_cpu_stat *min,
> +			    int prev, struct sched_domain *sd)
> +{
> +	/*  Select the one with the least number of running tasks */
> +	if (target->nr_running < min->nr_running)
> +		return true;
> +	if (target->nr_running > min->nr_running)
> +		return false;
>   
This makes me a bit worried about systems with coarse-grained OPPs. All 
my dev boards and one of my old phones have <= 3 OPPs. On my Juno board, 
the lowest OPP on the big core spans across 512 utilization, half of the 
full capacity. Assuming a scenario where there are 4 tasks, each with 
300, 100, 100, 100 utilization, the placement should be 300 on one core 
and 3 tasks with 100 on another, but the nr_running check here would 
give 2 tasks (300 + 100) on one CPU and 2 tasks (100 + 100) on another 
because they are still under the lowest OPP on Juno. The second CPU will 
also finish faster and idle more than the first one.

To give an extreme example, assuming the system has only one OPP (such a 
system is dumb to begin with, but just to make a point), before this 
patch EAS would still work okay in task placement, but after this patch, 
EAS would just balance on the number of tasks, regardless of utilization 
of tasks on wake-up.

I wonder if there is a way to still take total utilization as a factor. 
It used to be 100% of the decision making, but maybe now it is only 60%, 
and the other 40% are things like number of tasks and contention.

> -	trace_sched_compute_energy_tp(p, dst_cpu, energy, max_util, busy_time);
> +	/* Favor previous CPU otherwise */
> +	if (target->cpu == prev)
> +		return true;
> +	if (min->cpu == prev)
> +		return false;
>   
> -	return energy;
> +	/*
> +	 * Choose CPU with lowest contention. One might want to consider load instead of
> +	 * runnable but we are supposed to not be overutilized so there is enough compute
> +	 * capacity for everybody.
> +	 */
> +	if ((target->runnable * min->capa * sd->imbalance_pct) >=
> +			(min->runnable * target->capa * 100))
> +		return false;
> +
> +	return true;
>   }
> [...]