[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9J25xMrItpeHIxD@hirez.programming.kicks-ass.net>
Date: Thu, 26 Jan 2023 13:49:43 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Roman Kagan <rkagan@...zon.de>,
Zhang Qiao <zhangqiao22@...wei.com>,
Waiman Long <longman@...hat.com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
lkml <linux-kernel@...r.kernel.org>
Subject: Re: [bug-report] possible s64 overflow in max_vruntime()
On Wed, Jan 25, 2023 at 08:45:32PM +0100, Roman Kagan wrote:
> The calculation is indeed safe against the overflow of the vruntimes
> themselves. However, when the two vruntimes are more than 2^63 apart,
> their comparison gets inverted due to that s64 overflow.
Yes, but that's a whole different issue. vruntime are not expected to be
*that* far apart.
That is surely the abnormal case. The normal case is wrap around, and
that happens 'often' and should continue working.
> And this is what happens here: one scheduling entity has accumulated a
> vruntime more than 2^63 ahead of another. Now the comparison is
> inverted due to s64 overflow, and the latter can't get to the cpu,
> because it appears to have vruntime (much) bigger than that of the
> former.
If it can be 2^63 ahead, it can also be 2^(64+) ahead and nothing will
help.
> This situation is reproducible e.g. when one scheduling entity is a
> multi-cpu hog, and the other is woken up from a long sleep. Normally
A very low weight CPU hog?
> when a task is placed on a cfs_rq, its vruntime is pulled to
> min_vruntime, to avoid boosting the woken up task. However in this case
> the task is so much behind in vruntime that it appears ahead instead,
> its vruntime is not adjusted in place_entity(), and then it looses the
> cpu to the current scheduling entity.
What I think might be a way out here is passing the the sleep wall-time
(cfs_rq_clock_pelt() time I suppose) to place entity and simply skip the
magic if 'big'.
All that only matters for small sleeps anyway.
Something like:
sleep_time = U64_MAX;
if (se->avg.last_update_time)
sleep_time = cfs_rq_clock_pelt(cfs_rq) - se->avg.last_update_time;
if (sleep_time > 60*NSEC_PER_SEC) { // 1 minute is huge
se->vruntime = cfs_rq->min_vruntime;
return;
}
// ... rest of place_entity()
Hmm... ?
Powered by blists - more mailing lists