lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080205185517.GE26259@csclub.uwaterloo.ca>
Date:	Tue, 5 Feb 2008 13:55:17 -0500
From:	lsorense@...lub.uwaterloo.ca (Lennart Sorensen)
To:	Robin Lee Powell <rlpowell@...italkingdom.org>
Cc:	Nick Piggin <nickpiggin@...oo.com.au>, linux-kernel@...r.kernel.org
Subject: Re: Monthly md check == hung machine; how do I debug?

On Tue, Feb 05, 2008 at 09:10:05AM -0800, Robin Lee Powell wrote:
> On Mon, Feb 04, 2008 at 09:40:55PM +1100, Nick Piggin wrote:
> > On Monday 04 February 2008 08:21, Robin Lee Powell wrote:
> > > I've got a machine with a 4 disk SATA raid10 configuration using
> > > md. The entire disk is loop-AES encrypted, but that shouldn't
> > > matter here.
> > >
> > > Once a month, Debian runs:
> > >
> > >     /usr/share/mdadm/checkarray --cron --all --quiet
> > >
> > > and the machine hangs within 30 minutes of that starting.
> > >
> > > It seems that I can avoid the hang by not having "mdadm
> > > --monitor" running, but I'm not certain if that's the case or if
> > > I've just been lucky this go-round.
> > >
> > > I'm on kernel 2.6.23.1, my own compile thereof, x86_64, AMD
> > > Athlon(tm) 64 Processor 3700+.
> > >
> > > I've looked through all the 2.6.23 and 2.6.24 Changelogs, and I
> > > can't find anything that looks relevant.
> > >
> > > So, how can I (help you all) debug this?
> > 
> > Do you have a serial console? Does it respond to pings?
> 
> No and yes.
> 
> > Can you try to get sysrq+T traces, and sysrq+P traces, and post
> > them?
> 
> I played with those after you suggested it, but without serial
> console had no way to capture them.
> 
> I was able to solve the problem, however, like so:

I tend to adjust the max disk speed raid is allowed to use, since the
default of 200MB/s makes the system close to unusable while it is taking
place.  Could having slow disk access be causing things to lock up?

Things made much more sense some years ago when the default was 10MB/s
or something along those lines.

Who has 200MB/s capable hardware anyhow?

--
Len Sorensen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ