lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090828120854.GA8153@mit.edu>
Date:	Fri, 28 Aug 2009 08:08:54 -0400
From:	Theodore Tso <tytso@....edu>
To:	Pavel Machek <pavel@....cz>, NeilBrown <neilb@...e.de>
Cc:	Ric Wheeler <rwheeler@...hat.com>, Rob Landley <rob@...dley.net>,
	Florian Weimer <fweimer@....de>,
	Goswin von Brederlow <goswin-v-b@....de>,
	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
	rdunlap@...otime.net, linux-doc@...r.kernel.org,
	linux-ext4@...r.kernel.org, corbet@....net
Subject: Re: raid is dangerous but that's secret (was Re: [patch] ext2/3:
	document conditions when reliable operation is possible)

On Fri, Aug 28, 2009 at 08:44:49AM +0200, Pavel Machek wrote:
> From: Theodore Tso <tytso@....edu>
> 
> Document that many devices are too broken for filesystems to protect
> data in case of powerfail.
> 
> Signed-of-by: Pavel Machek <pavel@....cz> 

NACK.  I didn't write this patch, and it's disingenuous for you to try
to claim that I authored it.

You took text I wrote from the *middle* of an e-mail discussion and
you ignored multiple corrections to typo's that I made --- typo's that
I would have corrected if I had ultimately decided to post this as a
patch, which I did NOT.

While Neil Brown's corrections are minimally necessary so the text is
at least technically *correct*, it's still not the right advice to
give system administrators.  It's better than the fear-mongering
patches you had proposed earlier, but what would be better *still* is
telling people why running with degraded RAID arrays is bad, and to
give them further tips about how to use RAID arrays safely.

To use your ABS brakes analogy, just becase it's not safe to rely on
ABS brakes if the "check brakes" light is on, that doesn't justify
writing something alarmist which claims that ABS brakes don't work
100% of the time, don't use ABS brakes, they're broken!!!!

The first part of it is true, since ABS brakes can suffer mechnical
failure.  But what we should be telling drivers is, "if the 'check
brakes' light comes on, don't keep driving with it, go to a garage and
get it fixed!!!".  Similarly, if you get a notice that your RAID is
running in degraded mode, you've already suffered one failure; you
won't survive another failure, so fix that issue ASAP!

If you're really paranoid, you could decide to "pull over to the side
of the road"; that is, you could stop writing to the RAID array as
soon as possible, and then get the the RAID array rebuilt before
proceeding.  That can reduce the chances of a second failure.  But in
the real world, there are costs associated with taking a production
server off-line, and the prudent system administrator has to do a
risk-reward tradeoff.  A better approach might to have the array
configured with a hot spare, and to regularly scrub the array, and
configure the RAID array with either a battery backup or a UPS.  And
hot-swap drives might not be a bad idea, too.

But in any case, just because ABS brakes and RAID arrays can suffer
failures, that doesn't mean you should run around telling people not
to use RAID arrays or RAID arrays are broken.  People are better off
using RAID than not using single disk storage solutions, just as
people are better off using ABS brakes than not.

Your argument basically boils down to, "if you drive like a maniac
when the roads are wet and slippery, ABS brakes might not save your
life.  Since ABS brake might cause you to have a false sense of
security, it's better to tell users that ABS brakes are broken."

That's just silly.  What we should be telling people instead is (a)
pay attention to the check brakes light (just as you should pay
attention to the RAID array is degraded warning), and (b) while ABS
brakes will get you out of some situations with life and limb intact,
they do not repeal that laws of physics (do regular full and
incremental backups; practice disk scrubbing; use UPS's or battery
backups).

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ