[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1278702782.7122.1.camel@localhost.localdomain>
Date: Fri, 09 Jul 2010 12:13:02 -0700
From: Fernando Lopez-Lezcano <nando@...ma.Stanford.EDU>
To: john stultz <johnstul@...ibm.com>
Cc: nando@...ma.Stanford.EDU, Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
rt-users <linux-rt-users@...r.kernel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Nick Piggin <npiggin@...e.de>
Subject: Re: 2.6.33.5 rt23: machine lockup (nfs/autofs related?)
On Fri, 2010-07-09 at 12:02 -0700, Fernando Lopez-Lezcano wrote:
> On Thu, 2010-07-08 at 16:00 -0700, john stultz wrote:
> > On Thu, 2010-07-08 at 15:44 -0700, Fernando Lopez-Lezcano wrote:
> > > On Thu, 2010-07-08 at 15:33 -0700, john stultz wrote:
> > > > On Thu, 2010-07-08 at 10:19 -0700, Fernando Lopez-Lezcano wrote:
> > > > > We are having problems with 2.6.33.5+rt23, at least in our configuration
> > > > > while accessing an nfs automounted directory. This causes a complete
> > > > > machine lockup (press reset to exit as the only option).
> > > > >
> > > > > I simply use the Nautilus file manager (in Fedora 12) to navigate to an
> > > > > autofs mounted directory and the process monitor goes to 100% on one
> > > > > core (or maybe two), the mouse jerks a bit and the whole thing goes
> > > > > catatonic almost immediately.
> > > > >
> > > > > I get this in any open terminal at the time of the crash:
> > > > >
> > > > > --------
> > > > > Message from syslogd@...alhost at Jul 8 10:13:54 ...
> > > > > kernel:------------[ cut here ]------------
> > > > >
> > > > > Message from syslogd@...alhost at Jul 8 10:13:54 ...
> > > > > kernel:invalid opcode: 0000 [#1] PREEMPT SMP
> > > > >
> > > > > Message from syslogd@...alhost at Jul 8 10:13:54 ...
> > > > > kernel:last sysfs
> > > > > file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
> > > > >
> > > > > Message from syslogd@...alhost at Jul 8 10:13:54 ...
> > > > > kernel:Process nautilus (pid: 2874, ti=f0204000 task=f17dd1f0
> > > > > task.ti=f0204000)
> > > > >
> > > > > Message from syslogd@...alhost at Jul 8 10:13:54 ...
> > > > > kernel:Stack:
> > > > >
> > > > > Message from syslogd@...alhost at Jul 8 10:13:54 ...
> > > > > kernel:Call Trace:
> > > > >
> > > > > Message from syslogd@...alhost at Jul 8 10:13:54 ...
> > > > > kernel:Code: 7b 08 00 89 45 b8 75 12 8d 43 04 89 43 04 89 43 08 8d 43
> > > > > 0c 89 43 0c 89 43 10 8b 43 14 64 8b 15 2c d1 a5 c0 83 e0 fc 39 c2 75 04
> > > > > <0f> 0b eb fe 8b 3a 81 ff 08 01 00 00 74 0a 83 ff 02 b8 04 00 00
> > > > >
> > > > > Message from syslogd@...alhost at Jul 8 10:13:54 ...
> > > > > kernel:EIP: [<c0792c0f>] rt_spin_lock_slowlock+0x43/0x1bb SS:ESP
> > > > > 0068:f0205cbc
> > > > > --------
> > > > >
> > > > > And that's it... nothing else in the logs.
> > > >
> > > > Hrm. Not too much to go on there, but thanks for the report.
> > > >
> > > >
> > > > > For now we are booting into the normal Fedora kernel (this is on Fedora
> > > > > 12) as this makes the rt kernel not usable in our setup.
> > > > >
> > > > > Let me know if there is anything else I can do to help debug this...
> > > >
> > > > Had you done any testing with earlier 2.6.33-rt kernels where this
> > > > didn't occur? If so what version?
> > >
> > > I have been working with the whole series but my main usage case does
> > > not use nfs/autofs (see next paragraphs).
> > >
> > > I have noticed that the problem does not appear to happen when I cd into
> > > an nfs automounted directory directly. It appears to happen only when
> > > listing the contents of a mount point (ie: when "/whatever/" is an
> > > autofs mount point where several directories are mounted, not
> > > necessarily from the same server).
> > >
> > > Before switching to Fedora 12 users were normally running 2.6.29 rt and
> > > I had been running 2.6.31.x and 2.6.33.x rt, but I don't think it ever
> > > happened to me personally (I'm always using the command line - this is
> > > completely reproducible with nautilus). After the switch it started
> > > happening almost immediately to regular users (using nautilus mostly).
> > >
> > > How could I try to get more debugging information?
> >
> > Any chance you have a serial port on the machine in question? If so its
> > likely any oops messages could be collected over that.
>
> No response from the network or the keyboard or
> mouse at this point, reset is the only way out.
Not quite true, it does respond to the sysrq key (a sync command got an
immediate dump in the terminal). But the boot command does not reboot
the machine.
-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists