[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <83876902-d53f-48b2-95d3-79add3373452@arm.com>
Date: Mon, 31 Mar 2025 12:46:02 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Pierre Gondois <pierre.gondois@....com>,
Vincent Guittot <vincent.guittot@...aro.org>
Cc: linux-kernel@...r.kernel.org, Chritian Loehle <christian.loehle@....com>,
Hongyan Xia <hongyan.xia2@....com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH] sched/fair: Allow decaying util_est when util_avg > CPU
capa
On 27/03/2025 10:35, Pierre Gondois wrote:
>
>
> On 3/26/25 18:25, Vincent Guittot wrote:
>> On Tue, 25 Mar 2025 at 16:06, Pierre Gondois <pierre.gondois@....com>
>> wrote:
>>>
>>> commit 10a35e6812aa ("sched/pelt: Skip updating util_est when
>>> utilization is higher than CPU's capacity")
>>> prevents util_est from being updated if util_avg is higher than the
>>> underlying CPU capacity to avoid overestimating the task when the CPU
>>> is capped (due to thermal issue for instance). In this scenario, the
>>> task will miss its deadlines and start overlapping its wake-up events
>>> for instance. The task will appear as always running when the CPU is
>>> just not powerful enough to allow having a good estimation of the
>>> task.
This one will be removed by your patch, right?
>>>
>>> commit b8c96361402a ("sched/fair/util_est: Implement faster ramp-up
>>> EWMA on utilization increases")
>>> sets ewma to util_avg when ewma > util_avg, allowing ewma to quickly
>>> grow instead of slowly converge to the new util_avg value when a task
>>> profile changes from small to big.
>>>
>>> However, the 2 conditions:
>>> - Check util_avg against max CPU capacity
I assume this is the condition you remove and
>>> - Check whether util_est > util_avg
this is:
4918 /*
4919 * Reset EWMA on utilization increases, the moving average is used
4920 * to smooth utilization decreases.
4921 */
4922 if (ewma <= dequeued) {
4923 ewma = dequeued;
4924 goto done;
4925 }
which is before the condition you remove?
So maybe explain those conditions and their order more carefully? So
it's easier to grasp.
>>> are placed in an order such as it is possible to set util_est to a
>>> value higher than the CPU capacity if util_est > util_avg, but
>>> util_est is prevented to decay as long as:
>>> CPU capacity < util_avg < util_est.
Maybe mentioning 'util_avg eq. dequeued' and 'util_est eq. ewma' would
help here for easier understanding.
>>> Just remove the check as either:
>>> 1.
>>> There is idle time on the CPU. In that case the util_avg value of the
>>> task is actually correct. It is possible that the task missed a
>>> deadline and appears bigger, but this is also the case when the
>>> util_avg of the task is lower than the maximum CPU capacity.
>>> 2.
>>> There is no idle time. In that case, the util_avg value might aswell
>>> be an under estimation of the size of the task.
>>> It is possible that undesired frequency spikes will appear when the
>>> task is later enqueued with an inflated util_est value, but the
>>> frequency spike might aswell be deserved. The absence of idle time
>>> prevents from drawing any conclusion.
>>>
>>> Signed-off-by: Pierre Gondois <pierre.gondois@....com>
>>
>> This change looks reasonable to me. Did you face problems related to
>> this in a particular use case ?
>
> I think it was more related to the fact util_est is not decayed when:
> (runnable - util_avg) > margin
>
> This patch slightly helps to decay, but not that much.
Some of the 'stress-ng --class scheduler' seem to be be sensitive in
this regard. Haven't looked deeper into this.
[...]
Powered by blists - more mailing lists