[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <76bd70e30806041213l685aee07q510d1037d012b0a1@mail.gmail.com>
Date: Wed, 4 Jun 2008 15:13:25 -0400
From: "Chuck Lever" <chuck.lever@...cle.com>
To: "Dave Jones" <davej@...hat.com>,
"Chuck Lever" <chuck.lever@...cle.com>,
"Trond Myklebust" <Trond.Myklebust@...app.com>,
chucklever@...il.com, "Linux Kernel" <linux-kernel@...r.kernel.org>
Subject: Re: NFS oops in 2.6.26rc4
On Wed, Jun 4, 2008 at 2:20 PM, Dave Jones <davej@...hat.com> wrote:
> On Wed, Jun 04, 2008 at 02:13:08PM -0400, Chuck Lever wrote:
> >
> > On Jun 4, 2008, at 10:19 AM, Dave Jones wrote:
> >
> > > On Fri, May 30, 2008 at 03:37:01PM -0400, Chuck Lever wrote:
> > >
> > >>> Something else of note which I hadn't seen before, usually things
> > >>> lock
> > >>> up just after that first oops. For some reason, today it survived
> > >>> a little longer, but things really went downhill fast.
> > >>> It survived a 'dmesg ; scp dmesg davej@...k', and then wedged solid.
> > >>> So as well as the oops, it seems we're corrupting memory too.
> > >>> For reference, this kernel has both SLUB_DEBUG and PAGEALLOC_DEBUG
> > >>> enabled.
> > >>
> > >> I haven't seen this kind of problem here with .26, but yes, it does
> > >> look like something is clobbering memory during an NFS mount.
> > >>
> > >> I introduced some NFS mount parsing changes in this commit range:
> > >>
> > >> 2d767432..82d101d5
> > >>
> > >> A quick bisect should show which, if any of these, is the guilty
> > >> party. If any of these are the problem, I suspect it's 3f8400d1.
> > >
> > > I didn't get time to try this out yet (hopefully tomorrow).
> > > In the meantime, we've just gotten word of another user seeing memory
> > > corruption with nfs - https://bugzilla.redhat.com/show_bug.cgi?id=449958
> >
> > 449958 could very well be the same problem. The stack traceback is a
> > lot cleaner than the one you originally sent, but there are a lot of
> > similarities. (I doubt this is related to symlinks, as the comment
> > suggests).
> >
> > Is commit 86d61d863 applied to the current rawhide kernel?
>
> That kernel was .26rc4.git2, so unless it's only gone in in the last day
> or two, yes. (Bandwidth impaired right now, and no local git repo to check)
Argh, I was afraid of that. I expected that commit to improve things.
Maybe it did, but this is a different problem? You're going to force
me to actually think about this. :-)
In any event, a bisect would be helpful here, when you can. I will
also stare at the traceback in 449958 and see if anything new jumps
out. It's certainly taken the heat off of the NFS client; it looks
like an rpcbind issue.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists