linux-kernel - Re: [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtDOhNmL0Nn3g-agnL5HH5nhwXb3-sfzydEe4nvRKAq3HQ@mail.gmail.com>
Date: Thu, 3 Oct 2024 08:27:00 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Quentin Perret <qperret@...gle.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, 
	mgorman@...e.de, vschneid@...hat.com, lukasz.luba@....com, 
	rafael.j.wysocki@...el.com, linux-kernel@...r.kernel.org, qyousef@...alina.io, 
	hongyan.xia2@....com
Subject: Re: [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized

On Tue, 1 Oct 2024 at 19:51, Quentin Perret <qperret@...gle.com> wrote:
>
> On Tuesday 01 Oct 2024 at 18:20:03 (+0200), Vincent Guittot wrote:
> > With commit 50181c0cff31 ("sched/pelt: Avoid underestimation of task
> > utilization"), the util_est remains set the value before having to
> > share the cpu with other tasks which means that the util_est remains
> > correct even if its util_avg decrease because of sharing the cpu with
> > other task. This has been done to cover the cases that you mention
> > above whereboth util_avg and util_est where decreasing when tasks
> > starts to  share  the CPU bandwidth with others
>
> I don't think I agree about the correctness of that util_est value at
> all. The above patch only makes it arbitrarily out of date in the truly
> overcommitted case. All the util-based heuristic we have in the
> scheduler are based around the assumption that the close future will
> look like the recent past, so using an arbitrarily old util-est is still
> incorrect. I can understand how this may work OK in RT-app or other

This fixes a real use case on android device

> use-cases with perfectly periodic tasks for their entire lifetime and
> such, but this doesn't work at all in the general case.
>
> > And feec() will return -1 for that case because util_est remains high
>
> And again, checking that a task fits is broken to start with if we don't
> know how big the task is. When we have reasons to believe that the util
> values are no longer correct (and the absence of idle time is a very
> good reason for that) we just need to give up on them. The fact that we
> have to resort to using out-of-date data to sort of make that work is
> just another proof that this is not a good idea in the general case.

That's where I disagree, this is not an out-of-date value, this is the
last correct one before sharing the cpu

>
> > the commit that I mentioned above covers those cases and the task will
> > not incorrectly fit to another smaller CPU because its util_est is
> > preserved during the overutilized phase
>
> There are other reasons why a task may look like it fits, e.g. two tasks
> coscheduled on a big CPU get 50% util each, then we migrate one away, the

50% of what ? not the cpu capacity. I think you miss one piece of the
recent pelt behavior here. I fullygree that when the system os
overcommitted the util base task placement is not correct but I also
think that feec() can't find a cpu in such case

> CPU looks half empty. Is it half empty? We've got no way to tell until

The same here, it's not thanks to util_est

> we see idle time. The current util_avg and old util_est value are just
> not helpful, they're both bad signals and we should just discard them.
>
> So again I do feel like the best way forward would be to change the
> nature of the OU threshold to actually ask cpuidle 'when was the last
> time there was idle time?' (or possibly cache that in the idle task
> directly). And then based on that we can decide whether we want to enter
> feec() and do util-based decision, or to kick the push-pull mechanism in
> your other patches, things like that. That would solve/avoid the problem
> I mentioned in the previous paragraph and make the OU detection more
> robust. We could also consider using different thresholds in different
> places to re-enable load-balancing earlier, and give up on feec() a bit
> later to avoid messing the entire task placement when we're only
> transiently OU because of misfit. But eventually, we really need to just
> give up on util values altogether when we're really overcommitted, it's
> really an invariant we need to keep.

For now, I will increase the OU threshold to cpu capacity to reduce
the false overutilized state because of misfit tasks which is what I
really care about. The redesign of OU will come in a different series
as this implies more rework. IIUC your point, we are more interested
by the prev cpu than the current one

>
> Thanks,
> Quentin