linux-kernel - Re: [patch] document flash/RAID dangers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.0908260643550.28411@asgard.lang.hm>
Date:	Wed, 26 Aug 2009 06:44:42 -0700 (PDT)
From:	david@...g.hm
To:	Ric Wheeler <rwheeler@...hat.com>
cc:	Theodore Tso <tytso@....edu>, Pavel Machek <pavel@....cz>,
	Florian Weimer <fweimer@....de>,
	Goswin von Brederlow <goswin-v-b@....de>,
	Rob Landley <rob@...dley.net>,
	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
	rdunlap@...otime.net, linux-doc@...r.kernel.org,
	linux-ext4@...r.kernel.org, corbet@....net
Subject: Re: [patch] document flash/RAID dangers

On Wed, 26 Aug 2009, Ric Wheeler wrote:

> On 08/26/2009 08:40 AM, Theodore Tso wrote:
>> On Wed, Aug 26, 2009 at 07:58:40AM -0400, Ric Wheeler wrote:
>>>> Drive in raid 5 failed; hot spare was available (no idea about
>>>> UPS). System apparently locked up trying to talk to the failed drive,
>>>> or maybe admin just was not patient enough, so he just powercycled the
>>>> array. He lost the array.
>>>> 
>>>> So while most people will not agressively powercycle the RAID array,
>>>> drive failure still provokes little tested error paths, and getting
>>>> unclean shutdown is quite easy in such case.
>>> 
>>> Then what we need to document is do not power cycle an array during a
>>> rebuild, right?
>> 
>> Well, the softwar raid layer could be improved so that it implements
>> scrubbing by default (i.e., have the md package install a cron job to
>> implement a periodict scrub pass automatically).  The MD code could
>> also regularly check to make sure the hot spare is OK; the other
>> possibility is that hot spare, which hadn't been used in a long time,
>> had silently failed.
>
> Actually, MD does this scan already (not automatically, but you can set up a 
> simple cron job to kick off a periodic "check"). It is a delicate balance to 
> get the frequency of the scrubbing correct.

debian defaults to doing this once a month (first sunday of each month), 
on some of my systems this scrub takes almost a week to complete.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/