linux-kernel - Re: [PATCH] sched/fair: Allow decaying util_est when util

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <83876902-d53f-48b2-95d3-79add3373452@arm.com>
Date: Mon, 31 Mar 2025 12:46:02 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Pierre Gondois <pierre.gondois@....com>,
 Vincent Guittot <vincent.guittot@...aro.org>
Cc: linux-kernel@...r.kernel.org, Chritian Loehle <christian.loehle@....com>,
 Hongyan Xia <hongyan.xia2@....com>, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH] sched/fair: Allow decaying util_est when util_avg > CPU
 capa

On 27/03/2025 10:35, Pierre Gondois wrote:
>  
> 
> On 3/26/25 18:25, Vincent Guittot wrote:
>> On Tue, 25 Mar 2025 at 16:06, Pierre Gondois <pierre.gondois@....com>
>> wrote:
>>>
>>> commit 10a35e6812aa ("sched/pelt: Skip updating util_est when
>>> utilization is higher than CPU's capacity")
>>> prevents util_est from being updated if util_avg is higher than the
>>> underlying CPU capacity to avoid overestimating the task when the CPU
>>> is capped (due to thermal issue for instance). In this scenario, the
>>> task will miss its deadlines and start overlapping its wake-up events
>>> for instance. The task will appear as always running when the CPU is
>>> just not powerful enough to allow having a good estimation of the
>>> task.

This one will be removed by your patch, right?

>>>
>>> commit b8c96361402a ("sched/fair/util_est: Implement faster ramp-up
>>> EWMA on utilization increases")
>>> sets ewma to util_avg when ewma > util_avg, allowing ewma to quickly
>>> grow instead of slowly converge to the new util_avg value when a task
>>> profile changes from small to big.
>>>
>>> However, the 2 conditions:
>>> - Check util_avg against max CPU capacity

I assume this is the condition you remove and

>>> - Check whether util_est > util_avg

this is:

 4918 /*
 4919  * Reset EWMA on utilization increases, the moving average is used
 4920  * to smooth utilization decreases.
 4921  */
 4922  if (ewma <= dequeued) {
 4923      ewma = dequeued;
 4924      goto done;
 4925  }

which is before the condition you remove?

So maybe explain those conditions and their order more carefully? So
it's easier to grasp.

>>> are placed in an order such as it is possible to set util_est to a
>>> value higher than the CPU capacity if util_est > util_avg, but
>>> util_est is prevented to decay as long as:
>>> CPU capacity < util_avg < util_est.

Maybe mentioning 'util_avg eq. dequeued' and 'util_est eq. ewma' would
help here for easier understanding.

>>> Just remove the check as either:
>>> 1.
>>> There is idle time on the CPU. In that case the util_avg value of the
>>> task is actually correct. It is possible that the task missed a
>>> deadline and appears bigger, but this is also the case when the
>>> util_avg of the task is lower than the maximum CPU capacity.
>>> 2.
>>> There is no idle time. In that case, the util_avg value might aswell
>>> be an under estimation of the size of the task.
>>> It is possible that undesired frequency spikes will appear when the
>>> task is later enqueued with an inflated util_est value, but the
>>> frequency spike might aswell be deserved. The absence of idle time
>>> prevents from drawing any conclusion.
>>>
>>> Signed-off-by: Pierre Gondois <pierre.gondois@....com>
>>
>> This change looks reasonable to me. Did you face problems related to
>> this in a particular use case ?
> 
> I think it was more related to the fact util_est is not decayed when:
> (runnable - util_avg) > margin
> 
> This patch slightly helps to decay, but not that much.

Some of the 'stress-ng --class scheduler' seem to be be sensitive in
this regard. Haven't looked deeper into this.

[...]