linux-kernel - Re: [bug-report] possible s64 overflow in max

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y9J25xMrItpeHIxD@hirez.programming.kicks-ass.net>
Date:   Thu, 26 Jan 2023 13:49:43 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Roman Kagan <rkagan@...zon.de>,
        Zhang Qiao <zhangqiao22@...wei.com>,
        Waiman Long <longman@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: [bug-report] possible s64 overflow in max_vruntime()

On Wed, Jan 25, 2023 at 08:45:32PM +0100, Roman Kagan wrote:

> The calculation is indeed safe against the overflow of the vruntimes
> themselves.  However, when the two vruntimes are more than 2^63 apart,
> their comparison gets inverted due to that s64 overflow.

Yes, but that's a whole different issue. vruntime are not expected to be
*that* far apart.

That is surely the abnormal case. The normal case is wrap around, and
that happens 'often' and should continue working.

> And this is what happens here: one scheduling entity has accumulated a
> vruntime more than 2^63 ahead of another.  Now the comparison is
> inverted due to s64 overflow, and the latter can't get to the cpu,
> because it appears to have vruntime (much) bigger than that of the
> former.

If it can be 2^63 ahead, it can also be 2^(64+) ahead and nothing will
help.

> This situation is reproducible e.g. when one scheduling entity is a
> multi-cpu hog, and the other is woken up from a long sleep.  Normally

A very low weight CPU hog?

> when a task is placed on a cfs_rq, its vruntime is pulled to
> min_vruntime, to avoid boosting the woken up task.  However in this case
> the task is so much behind in vruntime that it appears ahead instead,
> its vruntime is not adjusted in place_entity(), and then it looses the
> cpu to the current scheduling entity.

What I think might be a way out here is passing the the sleep wall-time
(cfs_rq_clock_pelt() time I suppose) to place entity and simply skip the
magic if 'big'.

All that only matters for small sleeps anyway.

Something like:

	sleep_time = U64_MAX;
	if (se->avg.last_update_time)
	  sleep_time = cfs_rq_clock_pelt(cfs_rq) - se->avg.last_update_time;

	if (sleep_time > 60*NSEC_PER_SEC) { // 1 minute is huge
	  se->vruntime = cfs_rq->min_vruntime;
	  return;
	}

	// ... rest of place_entity()

Hmm... ?