[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0804210934190.2779@woody.linux-foundation.org>
Date: Mon, 21 Apr 2008 09:54:07 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "Rafael J. Wysocki" <rjw@...k.pl>
cc: LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-ext4@...r.kernel.org,
Herbert Xu <herbert@...dor.apana.org.au>,
"Paul E. McKenney" <paulmck@...ibm.com>,
Jiri Slaby <jirislaby@...il.com>,
"David S. Miller" <davem@...emloft.net>
Subject: Re: 2.6.25-git2: BUG: unable to handle kernel paging request at
ffffffffffffffff
On Mon, 21 Apr 2008, Rafael J. Wysocki wrote:
>
> Well, it seems that the oops is actually known from -mm:
>
> http://lkml.org/lkml/2008/4/21/55
>
> and something similar was observed with 2.6.25-rc8-mm2.
Hmm. Sadly, I doubt that really cuts down the suspect list very much. Most
of what has been merged since 2.6.25 has been in -mm, so while I agree
that it looks very similar, the fact that it was possibly already in
-rc8-mm2 doesn't much _help_.
And in fact, those oopses in rc8-mm2 don't look _that_ similar. Those are
a corrupt f_mapping structure, it looks like (ie it looks like either
"struct address_space" or a "struct filp" rather than a "struct dentry").
What is interesting about Jiri's version of the bug is that he has another
value for the corruption than you do: you had either all-ones, or a value
that *looked* like possibly a single nybble got cleared.
Jiri, in contrast, has a value of 00f0000000000000. Which is a bit
interesting in that it's again a *nybble* that looks corrupt, but it's a
different one.
But assuming Jiri's two oopses are related (which is not entirely
unlikely), and assuming that this is a SLUB bucket re-use, then it's quite
likely that the reason that his -rc8-mm2 oops looks different just because
it was yet _another_ allocation that was in the same bucket. If so, the
most likely one is "struct filp", because it has the right size: for me a
filp is in the 192-byte bucket, which is very close to the 208-byte bucket
of dentry.
So I could imagine that some config option or other change just changed
the sizes around so that the two types ended up in different buckets in
rc8-mm2 and in 2.6.25-mm1 (ie neither the dentry nor the filp necessarily
changed sizes, but the *corrupting* type perhaps did?)
What I find interesting is that at least for me, I have the SLAB bucket
size for nf_conntrack_expect being 208 bytes. And the *biggest* merge by
far after 2.6.25 so far has been networking (and conntrack in particular)
Is that a smoking gun? Not necessarily. But it *is* intriguing. But there
are other possible clashes (the 192-byte bucket has several different
suspects, and not all of them are in networking).1
Jiri and Davem added to the Cc.
Jiri - could you also confirm whether you are usign SLUB (which is not
necessarily at all indicative of a SLUB bug itself - it's just that SLAB
won't ever even merge different allocations of the same size into the same
buckets, so if it's a cross-slab corruption, you'd simply never see it
with SLAB).
And if you are, can you please enable SLUB_DEBUG, and add a "slub_debug"
to your kernel command line to enable all the debugging? That would
hopefully catch any obvious use-after-free corruption.
I'm just whistling in the dark here, but it does seem worth pursuing this
approach. The VFS layer has not changed *at*all* since 2.6.25, so I
seriously doubt it's a dentry or filp bug - I think the corruption is
external. And while networking is certainly not the only suspect (the x86
architecture changes are pretty extensive too), the allocation size thing
certainly makes it intriguing.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists