lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 09 Jul 2010 12:13:02 -0700
From:	Fernando Lopez-Lezcano <nando@...ma.Stanford.EDU>
To:	john stultz <johnstul@...ibm.com>
Cc:	nando@...ma.Stanford.EDU, Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	rt-users <linux-rt-users@...r.kernel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Nick Piggin <npiggin@...e.de>
Subject: Re: 2.6.33.5 rt23: machine lockup (nfs/autofs related?)

On Fri, 2010-07-09 at 12:02 -0700, Fernando Lopez-Lezcano wrote:
> On Thu, 2010-07-08 at 16:00 -0700, john stultz wrote:
> > On Thu, 2010-07-08 at 15:44 -0700, Fernando Lopez-Lezcano wrote:
> > > On Thu, 2010-07-08 at 15:33 -0700, john stultz wrote:
> > > > On Thu, 2010-07-08 at 10:19 -0700, Fernando Lopez-Lezcano wrote:
> > > > > We are having problems with 2.6.33.5+rt23, at least in our configuration
> > > > > while accessing an nfs automounted directory. This causes a complete
> > > > > machine lockup (press reset to exit as the only option). 
> > > > > 
> > > > > I simply use the Nautilus file manager (in Fedora 12) to navigate to an
> > > > > autofs mounted directory and the process monitor goes to 100% on one
> > > > > core (or maybe two), the mouse jerks a bit and the whole thing goes
> > > > > catatonic almost immediately. 
> > > > > 
> > > > > I get this in any open terminal at the time of the crash:
> > > > > 
> > > > > --------
> > > > > Message from syslogd@...alhost at Jul  8 10:13:54 ...
> > > > >  kernel:------------[ cut here ]------------
> > > > > 
> > > > > Message from syslogd@...alhost at Jul  8 10:13:54 ...
> > > > >  kernel:invalid opcode: 0000 [#1] PREEMPT SMP 
> > > > > 
> > > > > Message from syslogd@...alhost at Jul  8 10:13:54 ...
> > > > >  kernel:last sysfs
> > > > > file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
> > > > > 
> > > > > Message from syslogd@...alhost at Jul  8 10:13:54 ...
> > > > >  kernel:Process nautilus (pid: 2874, ti=f0204000 task=f17dd1f0
> > > > > task.ti=f0204000)
> > > > > 
> > > > > Message from syslogd@...alhost at Jul  8 10:13:54 ...
> > > > >  kernel:Stack:
> > > > > 
> > > > > Message from syslogd@...alhost at Jul  8 10:13:54 ...
> > > > >  kernel:Call Trace:
> > > > > 
> > > > > Message from syslogd@...alhost at Jul  8 10:13:54 ...
> > > > >  kernel:Code: 7b 08 00 89 45 b8 75 12 8d 43 04 89 43 04 89 43 08 8d 43
> > > > > 0c 89 43 0c 89 43 10 8b 43 14 64 8b 15 2c d1 a5 c0 83 e0 fc 39 c2 75 04
> > > > > <0f> 0b eb fe 8b 3a 81 ff 08 01 00 00 74 0a 83 ff 02 b8 04 00 00 
> > > > > 
> > > > > Message from syslogd@...alhost at Jul  8 10:13:54 ...
> > > > >  kernel:EIP: [<c0792c0f>] rt_spin_lock_slowlock+0x43/0x1bb SS:ESP
> > > > > 0068:f0205cbc
> > > > > --------
> > > > > 
> > > > > And that's it... nothing else in the logs. 
> > > > 
> > > > Hrm. Not too much to go on there, but thanks for the report.
> > > > 
> > > > 
> > > > > For now we are booting into the normal Fedora kernel (this is on Fedora
> > > > > 12) as this makes the rt kernel not usable in our setup. 
> > > > > 
> > > > > Let me know if there is anything else I can do to help debug this...
> > > > 
> > > > Had you done any testing with earlier 2.6.33-rt kernels where this
> > > > didn't occur? If so what version?
> > > 
> > > I have been working with the whole series but my main usage case does
> > > not use nfs/autofs (see next paragraphs). 
> > > 
> > > I have noticed that the problem does not appear to happen when I cd into
> > > an nfs automounted directory directly. It appears to happen only when
> > > listing the contents of a mount point (ie: when "/whatever/" is an
> > > autofs mount point where several directories are mounted, not
> > > necessarily from the same server). 
> > > 
> > > Before switching to Fedora 12 users were normally running 2.6.29 rt and
> > > I had been running 2.6.31.x and 2.6.33.x rt, but I don't think it ever
> > > happened to me personally (I'm always using the command line - this is
> > > completely reproducible with nautilus). After the switch it started
> > > happening almost immediately to regular users (using nautilus mostly). 
> > > 
> > > How could I try to get more debugging information?
> > 
> > Any chance you have a serial port on the machine in question? If so its
> > likely any oops messages could be collected over that.
> 
> No response from the network or the keyboard or
> mouse at this point, reset is the only way out. 

Not quite true, it does respond to the sysrq key (a sync command got an
immediate dump in the terminal). But the boot command does not reboot
the machine. 

-- Fernando


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ