lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 7 Dec 2014 15:53:14 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Sasha Levin <sasha.levin@...cle.com>
Cc:	Dave Jones <davej@...hat.com>, Chris Mason <clm@...com>,
	Dâniel Fraga <fragabr@...il.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: frequent lockups in 3.18rc4

On Sun, Dec 7, 2014 at 6:58 AM, Sasha Levin <sasha.levin@...cle.com> wrote:
>
> Maybe the extra prints were just a catalyst?

So there's an interesting change in between 3.16..3.17 - a commit that
was already reverted once due to unrelated problems (it apparently hit
lockdep issues): commit 5874af2003b1 ("printk: enable interrupts
before calling console_trylock_for_printk()").

In particular, that commit means that interrupts get re-enabled in the
middle of the printk (if they were enabled before the printk), and
while I don't see why that would be wrong, it definitely might change
behavior. That code has often been fragile (the whole lockdep example
was just the latest case of that). For example, it ends up looping
over "goto again" with preemption disabled if new console messages
keep coming in.

So I don't think that "enable interrupts" commit itself is necessarily
buggy, but looking at all the printk changes in the relevant time
range, I can easily see that particular commit having some subtle
interaction under heavy printk activity. Before that commit, all the
queued printouts would be written with interrupts disabled all the
way. After that commit, interrupts get re-enabled before and in
between messages get actually pushed to the console.

Should it matter? No. But I don't think we figured out what went wrong
with the lockdep issue that an earlier version of that commit had
either, and that problem caused lockups at boot for some people.  The
whole "print to console" is just fragile, and the addition of serial
console migth just make it even worse.

I dunno. But especially since your RCU issues seem to solve themselves
when *not* having lots of printk's, maybe the lockup is somehow
related to this all. Maybe the lockdep recursion hang ends up being a
"RCU debugging" hang when the timer interrupt causes printk recursion
with the console lock held..

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ