lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 21 Dec 2014 20:22:21 -0500
From:	Dave Jones <davej@...emonkey.org.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>, Chris Mason <clm@...com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Suresh Siddha <sbsiddha@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Peter Anvin <hpa@...ux.intel.com>
Subject: Re: frequent lockups in 3.18rc4

On Sun, Dec 21, 2014 at 04:52:28PM -0800, Linus Torvalds wrote:
 > > The second time (or third, or fourth - it might not take immediately)
 > > you get a lockup or similar. Bad things happen.
 > 
 > I've only tested it twice now, but the first time I got a weird
 > lockup-like thing (things *kind* of worked, but I could imagine that
 > one CPU was stuck with a lock held, because things eventually ground
 > to a screeching halt.
 > 
 > The second time I got
 > 
 >   INFO: rcu_sched self-detected stall on CPU { 5}  (t=84533 jiffies
 > g=11971 c=11970 q=17)
 > 
 > and then
 > 
 >    INFO: rcu_sched detected stalls on CPUs/tasks: { 1 2 3 4 5 6 7}
 > (detected by 0, t=291309 jiffies, g=12031, c=12030, q=57)
 > 
 > with backtraces that made no sense (because obviously no actual stall
 > had taken place), and were the CPU's mostly being idle.
 > 
 > I could easily see it resulting in your softlockup scenario too.

So something trinity does when it doesn't have a better idea of
something to pass a syscall is to generate a random number.

A wild hypothesis could be that we're in one of these situations,
and we randomly generated 0xfed000f0 and passed that as a value to
a syscall, and the kernel wrote 0 to that address.

What syscall could do that, and not just fail a access_ok() or similar
is a mystery though.

	Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists