linux-kernel - Re: frequent lockups in 3.18rc4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1417806247.4845.1@mail.thefacebook.com>
Date:	Fri, 5 Dec 2014 14:04:07 -0500
From:	Chris Mason <clm@...com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Dave Jones <davej@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: frequent lockups in 3.18rc4



On Fri, Dec 5, 2014 at 1:38 PM, Linus Torvalds 
<torvalds@...ux-foundation.org> wrote:
> On Fri, Dec 5, 2014 at 9:15 AM, Dave Jones <davej@...hat.com> wrote:
>> 
>>  A bisect later, and I landed on a kernel that ran for a day, before
>>  spewing NMI messages, recovering, and then..
>> 
>>  
>> https://urldefense.proofpoint.com/v1/url?u=http://codemonkey.org.uk/junk/log.txt&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=APfD8%2BRkGVsO9UHnH6Oo05Zuoh90VyaaF71AycsnLbQ%3D%0A&s=de71b34f3a7da1c7b8f12dcd760c271657f9f7e2a93b4d2e296b2c687cee5157
> 
> I have to admit I'm seeing absolutely nothing sensible in there.
> 
> Call it bad, and see if bisection ends up slowly -oh so slowly -
> pointing to some direction. Because I don't think it's the hardware,
> considering that apparently 3.16 is solid. And the spews themselves
> are so incomprehensible that I'm not seeing any pattern what-so-ever.

I went back through all of the traces Dave has posted in this thread.  
This one looks like vm debugging is on:

 http://marc.info/?l=linux-kernel&m=141632237304726&w=2

Another had a function call from CONFIG_DEBUG_PAGEALLOC:

http://marc.info/?l=linux-kernel&m=141701248210949&w=2

So one idea is that our allocation/freeing of pages is dramatically 
more expensive and we're hitting a strange edge condition.  Maybe we're 
even faulting on a readonly page from a horrible place?

[83246.925234] end_request: I/O error, dev sda, sector 0

Ext3/4 shouldn't be doing IO to sector zero.  Something is stomping on 
ram?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/