lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9GG3N5ivVvyETa2@u40bc5e070a0153.ant.amazon.com>
Date:   Wed, 25 Jan 2023 20:45:32 +0100
From:   Roman Kagan <rkagan@...zon.de>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     Zhang Qiao <zhangqiao22@...wei.com>,
        Waiman Long <longman@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        "Vincent Guittot" <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        "Daniel Bristot de Oliveira" <bristot@...hat.com>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: [bug-report] possible s64 overflow in max_vruntime()

Upping the thread as we're hitting this problem too.

On Thu, Dec 22, 2022 at 01:45:48PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 21, 2022 at 11:19:31PM +0800, Zhang Qiao wrote:
> >     I found problem about s64 overflow in max_vruntime().
> > 
[...]
> > static inline u64 min_vruntime(u64 min_vruntime, u64 vruntime)
> > {
> > 	/*
> > 	 * vruntime=0x124b17fd59db8d02
> > 	 * min_vruntime=0x918fdb05287da7c3
> > 	 * vruntime - min_vruntime = 9276074894177461567 > s64_max, will s64 overflow
> > 	 */
> > 	s64 delta = (s64)(vruntime - min_vruntime);
> > 	if (delta < 0)
> > 		min_vruntime = vruntime;
> > 
> > 	return min_vruntime;
> > }
> > 
> > ----------
> > 
> > max_vruntime() will return the kworker's old vruntime, it is incorrect and the correct result
> > shoud be cfs_rq->minvruntime. This incorrect result is greater than cfs_rq->min_vruntime and
> > will cause kworker thread starved.
> > 
> >     Does anyone have a good suggestion for slove this problem? or bugfix patch.
> 
> I don't understand what you tihnk the problem is. Signed overflow is
> perfectly fine and works as designed here.

Disagreed.

The calculation is indeed safe against the overflow of the vruntimes
themselves.  However, when the two vruntimes are more than 2^63 apart,
their comparison gets inverted due to that s64 overflow.

And this is what happens here: one scheduling entity has accumulated a
vruntime more than 2^63 ahead of another.  Now the comparison is
inverted due to s64 overflow, and the latter can't get to the cpu,
because it appears to have vruntime (much) bigger than that of the
former.

This situation is reproducible e.g. when one scheduling entity is a
multi-cpu hog, and the other is woken up from a long sleep.  Normally
when a task is placed on a cfs_rq, its vruntime is pulled to
min_vruntime, to avoid boosting the woken up task.  However in this case
the task is so much behind in vruntime that it appears ahead instead,
its vruntime is not adjusted in place_entity(), and then it looses the
cpu to the current scheduling entity.

Thanks,
Roman.



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ