linux-kernel - Re: frequent lockups in 3.18rc4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFxawNiRT6+0Qzpt3oqdW5fVKc+_kiW6K-1-ajZPFV5xSQ@mail.gmail.com>
Date:	Sun, 7 Dec 2014 15:53:14 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Sasha Levin <sasha.levin@...cle.com>
Cc:	Dave Jones <davej@...hat.com>, Chris Mason <clm@...com>,
	Dâniel Fraga <fragabr@...il.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: frequent lockups in 3.18rc4

On Sun, Dec 7, 2014 at 6:58 AM, Sasha Levin <sasha.levin@...cle.com> wrote:
>
> Maybe the extra prints were just a catalyst?

So there's an interesting change in between 3.16..3.17 - a commit that
was already reverted once due to unrelated problems (it apparently hit
lockdep issues): commit 5874af2003b1 ("printk: enable interrupts
before calling console_trylock_for_printk()").

In particular, that commit means that interrupts get re-enabled in the
middle of the printk (if they were enabled before the printk), and
while I don't see why that would be wrong, it definitely might change
behavior. That code has often been fragile (the whole lockdep example
was just the latest case of that). For example, it ends up looping
over "goto again" with preemption disabled if new console messages
keep coming in.

So I don't think that "enable interrupts" commit itself is necessarily
buggy, but looking at all the printk changes in the relevant time
range, I can easily see that particular commit having some subtle
interaction under heavy printk activity. Before that commit, all the
queued printouts would be written with interrupts disabled all the
way. After that commit, interrupts get re-enabled before and in
between messages get actually pushed to the console.

Should it matter? No. But I don't think we figured out what went wrong
with the lockdep issue that an earlier version of that commit had
either, and that problem caused lockups at boot for some people.  The
whole "print to console" is just fragile, and the addition of serial
console migth just make it even worse.

I dunno. But especially since your RCU issues seem to solve themselves
when *not* having lots of printk's, maybe the lockup is somehow
related to this all. Maybe the lockdep recursion hang ends up being a
"RCU debugging" hang when the timer interrupt causes printk recursion
with the console lock held..

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/