lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 13 Dec 2014 09:19:53 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Dave Jones <davej@...hat.com>, Chris Mason <clm@...com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: frequent lockups in 3.18rc4


* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Fri, Dec 12, 2014 at 10:54 AM, Dave Jones <davej@...hat.com> wrote:
>
> >
> > Something that's still making me wonder if it's some kind of 
> > hardware problem is the non-deterministic nature of this bug.
> 
> I'd expect it to be a race condition, though. Which can easily 
> cause these kinds of issues, and the timing will be pretty 
> random even if the load is very regular.
> 
> And we know that the scheduler has an integer overflow under 
> Sasha's loads, although I didn't hear anything from Ingo and 
> friends about it. Ingo/Peter, you were cc'd on that report, 
> where at least one of the multiplcations in wake_affine() ended 
> up overflowing..

Just to make sure, is there any other wake_affine report other 
than the one in this thread? (I tried a wake_affine full text 
search on my inbox and didn't find anything that appeared 
relevant.)

> Some scheduler thing that overflows only under heavy load, and 
> screws up scheduling could easily account for the RCU thread 
> thing. I see it *less* easily accounting for DaveJ's case, 
> though, because the watchdog is running at RT priority, and the 
> scheduler would have to screw up much more to then not schedule 
> an RT task, but..

Yeah, the RT scheduler is harder (but not impossible) to confuse 
due to its simplicity, but scheduler counts overflowing could 
definitely cause all sorts of trouble and make debugging harder, 
so we want to fix it regardless of its likelihood of causing 
lockups.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists