[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27240.1188573002@turing-police.cc.vt.edu>
Date: Fri, 31 Aug 2007 11:10:02 -0400
From: Valdis.Kletnieks@...edu
To: Ian Kent <raven@...maw.net>
Cc: John Stoffel <john@...ffel.org>,
Peter Staubach <staubach@...hat.com>,
Robin Lee Powell <rlpowell@...italkingdom.org>,
linux-kernel@...r.kernel.org
Subject: Re: NFS hang + umount -f: better behaviour requested.
On Fri, 31 Aug 2007 16:06:36 +0800, Ian Kent said:
> So, there's a power outage and the UPS had a glitch.
Murphy can get a *lot* more creative than that.
So we'd outgrown the capacity on our UPS and diesel generator, and decided
to replace them. So we schedule downtime for a Saturday. Rather scary, we
had a Sun E10K that had been powered-up for several years, and just as expected,
a good fraction of the 400+ drives it had failed to re-spinup. While recovering
from that, we discovered that although the vast majority of the 400 drives were
either mirrors or raidsets, due to a config error, the boot volume wasn't
mirrored (fortunately, it spun up OK so we dodged the bullet), so we fixed that.
Literally the next Friday, not even a week later, a contractor relocating a
door into our machine room shorted out a sensor circuit in our fire suppression
system, triggering a Halon dump. Of course, no amount of UPS and diesel was
going to save us now, because there was a safety interlock that killed the
power feeds if the Halon dumped. This time, since they'd all been stressed
just a week before, only 2 of the 400+ disks on the E10K failed to spin up.
Guess which two. ;)
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists