linux-kernel - Re: [RFC PATCH v2 1/7] Revert "sched/uclamp: Set max_spare_cap_cpu even if max_spare

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bae6421c-5192-44e6-9ed9-1d32e7122cb7@arm.com>
Date: Tue, 19 Mar 2024 17:05:26 +0000
From: Hongyan Xia <hongyan.xia2@....com>
To: Dietmar Eggemann <dietmar.eggemann@....com>,
 Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Juri Lelli <juri.lelli@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
 Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
 Daniel Bristot de Oliveira <bristot@...hat.com>,
 Valentin Schneider <vschneid@...hat.com>
Cc: Qais Yousef <qyousef@...alina.io>,
 Morten Rasmussen <morten.rasmussen@....com>,
 Lukasz Luba <lukasz.luba@....com>,
 Christian Loehle <christian.loehle@....com>, linux-kernel@...r.kernel.org,
 David Dai <davidai@...gle.com>, Saravana Kannan <saravanak@...gle.com>
Subject: Re: [RFC PATCH v2 1/7] Revert "sched/uclamp: Set max_spare_cap_cpu
 even if max_spare_cap is 0"

On 19/03/2024 15:34, Dietmar Eggemann wrote:
> On 01/02/2024 14:11, Hongyan Xia wrote:
>> From: Hongyan Xia <Hongyan.Xia2@....com>
>>
>> That commit creates further problems because 0 spare capacity can be
>> either a real indication that the CPU is maxed out, or the CPU is
>> UCLAMP_MAX throttled, but we end up giving all of them a chance which
>> can results in bogus energy calculations. It also tends to schedule
>> tasks on the same CPU and requires load balancing patches. Sum
>> aggregation solves these problems and this patch is not needed.
>>
>> This reverts commit 6b00a40147653c8ea748e8f4396510f252763364.
> 
> I assume you did this revert especially for the 'Scenario 5: 8 tasks
> with UCLAMP_MAX of 120' testcase?

More or less. Actually you can already see the problem in Scenario 1. 
Ideally the 4 uclamp_max tasks should be evenly distributed on 4 little 
CPUs, but from time to time task placement places more than 1 such task 
on the same CPU, leaving some other little CPUs not occupied.

> IMHO, the issue is especially visible in compute_energy()'s busy_time
> computation with a valid destination CPU (dst_cpu >= 0). I.e. when we
> have to add performance domain (pd) and task busy time.
> 
> find_energy_efficient_cpu() (feec())
> 
>   for each pd
>    for each cpu in pd
> 
>     set {prev_,max}_spare_cap
> 
>   bail if prev_ and max_spare_cap < 0 (was == 0 before )
> 
>   {base_,prev_,cur_}energy = compute_energy
> 
> So with the patch we potentially compute energy for a saturated PD
> according:
> 
>   compute_energy()
> 
>    if (dst_cpu >= 0)
>     busy_time = min(eenv->pd_cap, eenv->busy_time + eenv->task_busy_time)
>                     <----(a)--->  <--------------(b)------------------->
> 
>    energy = em_cpu_energy(pd->em_pd, max_util, busy_time, eenv->cpu_cap)
> 
> If (b) > (a) then we're saturated and 'energy' is bogus.

Yeah, I think what's happening is because placing more tasks on the same 
CPU won't increase energy computation, so in the end task placement 
thinks it's the better decision. The root issue is that once you have 
uclamp_max, you can theoretically fit an infinite number of such tasks 
on the same CPU.

> The way to fix this is up for discussion:
> 
> (1) feec() returning prev_cpu
> (2) feec() returning -1 (forcing wakeup into sis() -> sic())
> (3) using uclamped values for task and rq utilization
> 
> None of those have immediately given the desired task placement on
> mainline (2 tasks on each of the 4 little CPUs and no task on the 2 big
> CPUs on my [l B B l l l] w/ CPU capacities = [446 1024 1024 446 446 446]
> machine) you can achieve with uclamp sum aggregation.

Personally from the results I've seen I definitely prefer (3), although 
(3) has other problems. One thing is that sum aggregation pushes up 
utilization with uclamp_min, but its energy consumption definitely won't 
be that high. The real energy is between its util_avg and util_avg_uclamp.

I haven't seen this as a real problem, but maybe we can see even better 
task placement if this is accounted for.

> [...]