lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090826124058.GK32712@mit.edu>
Date:	Wed, 26 Aug 2009 08:40:58 -0400
From:	Theodore Tso <tytso@....edu>
To:	Ric Wheeler <rwheeler@...hat.com>
Cc:	Pavel Machek <pavel@....cz>, david@...g.hm,
	Florian Weimer <fweimer@....de>,
	Goswin von Brederlow <goswin-v-b@....de>,
	Rob Landley <rob@...dley.net>,
	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
	rdunlap@...otime.net, linux-doc@...r.kernel.org,
	linux-ext4@...r.kernel.org, corbet@....net
Subject: Re: [patch] document flash/RAID dangers

On Wed, Aug 26, 2009 at 07:58:40AM -0400, Ric Wheeler wrote:
>> Drive in raid 5 failed; hot spare was available (no idea about
>> UPS). System apparently locked up trying to talk to the failed drive,
>> or maybe admin just was not patient enough, so he just powercycled the
>> array. He lost the array.
>>
>> So while most people will not agressively powercycle the RAID array,
>> drive failure still provokes little tested error paths, and getting
>> unclean shutdown is quite easy in such case.
>
> Then what we need to document is do not power cycle an array during a  
> rebuild, right?

Well, the softwar raid layer could be improved so that it implements
scrubbing by default (i.e., have the md package install a cron job to
implement a periodict scrub pass automatically).  The MD code could
also regularly check to make sure the hot spare is OK; the other
possibility is that hot spare, which hadn't been used in a long time,
had silently failed.

> In the end, there are cascading failures that will defeat any data  
> protection scheme, but that does not mean that the value of that scheme  
> is zero. We need to be get more people to use RAID (including MD5) and  
> try to enhance it as we go. Just using a single disk is not a good 
> thing...

Yep; the solution is to improve the storage devices.  It is *not* to
encourage people to think RAID is not worth it, or that somehow ext2
is better than ext3 because it runs fsck's all the time at boot up.
That's just crazy talk.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ