lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 7 Feb 2012 17:54:52 +0100
From:	Jan Kara <jack@...e.cz>
To:	Gerard Saraber <gsaraber@...il.com>
Cc:	Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
	xfs@....sgi.com
Subject: Re: Soft lockup problem

On Tue 07-02-12 10:35:37, Gerard Saraber wrote:
> On Mon, Feb 6, 2012 at 4:51 PM, Jan Kara <jack@...e.cz> wrote:
> > On Mon 06-02-12 09:40:45, Gerard Saraber wrote:
> >> Greetings everyone,
> >> I've been having a bit of a problem since upgrading to the linux 3.x
> >> series, I have a machine that we're using as a NAS that runs various
> >> rsync processes (mostly at night), lately after a day or two, I will
> >> come in in the morning to a load average of 49, but the machine not
> >> really doing anything, when trying to run 'dstat' the command just
> >> hung with no output at all. there were no errors in the logs, or even
> >> anything that would vaguely point at anything I could work with.
> >> So needing to get the machine back to work I attempted to reboot it
> >> "shutdown -r now" on console... it gives a nice message saying it's
> >> going to reboot, but nothing ever happens.. the only way to reboot it
> >> is by using ctrl + alt + sysrq + b. after which the machine reboots
> >> and the raid array comes back clean.
> >>
> >> I'm not sure how to troubleshoot this, any pointers would be appreciated.
> >>
> >> I'm compiling 3.2.4 at the moment and found a bunch of possibly useful
> >> options in the kernel debugging section:
> >> detect hard/soft lockups and detect hung tasks, maybe it'll give me
> >> something more to go on.
> >>
> >> Some details about the machine:
> >> Linux xenbox 3.2.2 #1 SMP Sun Jan 29 10:28:22 CST 2012 x86_64 Intel(R)
> >> Xeon(R) CPU 5140 @ 2.33GHz GenuineIntel GNU/Linux
> >> It has 3 software raid arrays (2 x 5 drives and 1 x 4 drives) LVM'ed
> >> together into a 23TB XFS filesystem.
> >> 6GB memory and a pair of Intel Gigabit ethernet controllers bonded together.
> >  Hmm, might be some deadlock in the filesystem. Adding XFS guys to CC.
> > Can you run 'echo w >/proc/sysrq-trigger' and post output of dmesg here?
> >
> >                                                                Honza
> > --
> > Jan Kara <jack@...e.cz>
> > SUSE Labs, CR
> 
> Thanks for the quick reply,
> the machine is running good at the moment so I'm not sure if the
> output helps, but here it is:
> [I'll also be sure to grab this log the next time it locks]
  Yeah. Sorry, I was not clear but I meant you should grab the traces when
the machine locks up again...
								Honza

-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ