[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <62a15cd6-db6a-4010-94db-e78aad9fc7ac@arm.com>
Date: Tue, 19 Nov 2024 14:46:20 +0000
From: Christian Loehle <christian.loehle@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>,
Quentin Perret <qperret@...gle.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, lukasz.luba@....com,
rafael.j.wysocki@...el.com, linux-kernel@...r.kernel.org,
qyousef@...alina.io, hongyan.xia2@....com
Subject: Re: [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized
On 10/3/24 07:27, Vincent Guittot wrote:
> On Tue, 1 Oct 2024 at 19:51, Quentin Perret <qperret@...gle.com> wrote:
>>
>> On Tuesday 01 Oct 2024 at 18:20:03 (+0200), Vincent Guittot wrote:
>>> With commit 50181c0cff31 ("sched/pelt: Avoid underestimation of task
>>> utilization"), the util_est remains set the value before having to
>>> share the cpu with other tasks which means that the util_est remains
>>> correct even if its util_avg decrease because of sharing the cpu with
>>> other task. This has been done to cover the cases that you mention
>>> above whereboth util_avg and util_est where decreasing when tasks
>>> starts to share the CPU bandwidth with others
>>
>> I don't think I agree about the correctness of that util_est value at
>> all. The above patch only makes it arbitrarily out of date in the truly
>> overcommitted case. All the util-based heuristic we have in the
>> scheduler are based around the assumption that the close future will
>> look like the recent past, so using an arbitrarily old util-est is still
>> incorrect. I can understand how this may work OK in RT-app or other
>
> This fixes a real use case on android device
>
>> use-cases with perfectly periodic tasks for their entire lifetime and
>> such, but this doesn't work at all in the general case.
>>
>>> And feec() will return -1 for that case because util_est remains high
>>
>> And again, checking that a task fits is broken to start with if we don't
>> know how big the task is. When we have reasons to believe that the util
>> values are no longer correct (and the absence of idle time is a very
>> good reason for that) we just need to give up on them. The fact that we
>> have to resort to using out-of-date data to sort of make that work is
>> just another proof that this is not a good idea in the general case.
>
> That's where I disagree, this is not an out-of-date value, this is the
> last correct one before sharing the cpu
Just adding on this since we are discussing the correctness of util_est
value on an OU CPU since
commit 50181c0cff31 ("sched/pelt: Avoid underestimation of task utilization").
I agree that this commit fixed the immediate false util_est drop after
coscheduling two (or more) tasks, but that's a specific one.
If one of two coscheduled tasks starts growing their util_est can't reflect
that if their compute demand grows above CPU-capacity, that commit doesn't
change the fact. There is no generally sensible way of estimating such a
util_est anyway.
Even worse if both coscheduled tasks grow which isn't uncommon, considering
they might be related.
So
"this is the last correct one before sharing the cpu" is true,
"This is not an out-of-date value" isn't true in the general case.
I agree that the OU definition can evolve, basing that on idle time makes
sense, but given the common period of 16ms (frame rate) we might delay
setting OU by quite a lot for the cases it 'actually is true'.
Regards,
Christian
Powered by blists - more mailing lists