[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.00.1202241859160.31150@pobox.suse.cz>
Date: Fri, 24 Feb 2012 19:01:15 +0100 (CET)
From: Jiri Kosina <jkosina@...e.cz>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Hugh Dickins <hughd@...gle.com>,
Al Viro <viro@...iv.linux.org.uk>,
Andrew Morton <akpm@...ux-foundation.org>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: Linux 3.3-rc4
On Fri, 24 Feb 2012, Linus Torvalds wrote:
> > The machine has gone through several suspend-resume cycles before this
> > happened, so it might well also be some memory corruption caused by a
> > random driver.
>
> I almost think it is, because "file->dentry" should never be NULL in a
> mapping afaik. Especially as your "mapping" certainly isn't NULL (it's
> in %r12, so you can see it in your register dump).
>
> This isn't some unusual code sequence either, so I don't see it as
> some random latent bug that is just very unlikely and hard to trigger
> in that code itself.
>
> I'll think about it, but my first reaction is "memory corruption". Do
> you think you could try to run with a kernel that has SLAB debugging
> and poisoning on? If it's a stale pointer dereference that has cleared
> that dentry, that _might_ show it closer to the actual bug (rather
> than a long time later when the NULL dereference happens).
Running DEBUG_SLAB kernel since I have first hit the bug, but nothing
popped up yet. Seems undebuggable so far.
On the other hand I wouldn't blame HW for a bit-flip, as it was a clear
NULL pointer (plus 0x30 offset), not a random garbage.
--
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists