lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1269290076.16688.1814.camel@atllxdanncox.cisco.com>
Date:	Mon, 22 Mar 2010 16:34:36 -0400
From:	Danny Cox <danncox@...co.com>
To:	linux-kernel@...r.kernel.org
Subject: 2.6.31 Momentary Hang

Kernel Gurus,

	A colleague of mine is experiencing severe Denial Of Service multiple
times a day.  When a disk intensive process is started, in our case, a
Subversion check out, we observe the load average spike from 6 to 10,
all CPUs are at idle, but the IO wait time is in the thousands of
milliseconds (1500 - 3000).  If we wait long enough, the load average
begins to drop, but will hover around 5 for a couple of minutes.
Afterward, it will quickly drop toward 0.

	The machine is fairly new, having been purchased in the December
timeframe.  It uses an ASUS P7P55D-LE with an Intel Core I7 860 with 8
GB of ram.  It is running Ubuntu 9.10 with all patches applied.  The
kernel is 2.6.31-20.  It uses two WD 500 GB Caviar green drives, with
software RAID1 on 3 of the partitions: /, /boot, and /home.

	During the hang time, almost nothing can be started.  We've been using
top, atop, vmstat, and the Gnome system monitor to see what's occurring.
Our only hints are the high load average, and the I/O wait times.

	At this point, Google has been unable to provide answers, and my
colleague is ready to perform physical violence on the system.  I don't
even know what I can or should measure next.  Hints are welcome.  A
solution would be even better, if any of the above strikes a chord.

	One other data point: we have 5+ other identical systems, none of which
have this issue.  My colleague notes that his system was fine for a
couple of weeks.  It is possible that he installed some package that
causes this behavior.  That's merely speculation, of course.

	Please include me in the CC: header, as I'm not subscribed to
linux-kernel.  I'd like to be, but the volume is too much.

	Thanks for your time!

P.S.  Things we've tried:

* move the drives to another (identical) machine.  The issue persists.

* disable one of the drives in the RAID.  The issue persists.


-- 
Danny Cox
770-236-6148
Cisco
Service Provider Video Technology Group

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ