linux-kernel - Re: Raid not shutting down when disks are lost?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e9c3a7c20911211121l517837f9s17283dd37e0dd865@mail.gmail.com>
Date:	Sat, 21 Nov 2009 12:21:58 -0700
From:	Dan Williams <dan.j.williams@...el.com>
To:	Pierre Ossman <pierre-list@...man.eu>
Cc:	neilb@...e.de, LKML <linux-kernel@...r.kernel.org>
Subject: Re: Raid not shutting down when disks are lost?

On Sat, Nov 21, 2009 at 9:03 AM, Pierre Ossman <pierre-list@...man.eu> wrote:
> Neil?
>
> On Thu, 8 Oct 2009 16:39:52 +0200
> Pierre Ossman <pierre-list@...man.eu> wrote:
>
>> Today one RAID6 array I manage decided to lose four out of eight disks.
>> Oddly enough, the array did not shut down but instead I got
>> intermittent read and writer errors from the filesystem.

This is expected.

The array can't shutdown when there is a mounted filesystem.  Reads
may still be serviced from the survivors, all writes should be aborted
with an error.

>>
>> It's been some time since I had a failure of this magnitude, but I seem
>> to recall that once the array lost too many disks, it would shut down
>> and refuse to write a single byte. The nice effect of this was that if
>> it was a temporary error, you could just reboot and the array would
>> start nicely (albeit in degraded mode).
>>
>> Has something changed? Is this perhaps an effect of using RAID6 (I used
>> to run RAID5 arrays)? Or was I simply lucky the previous instances I've
>> had?

It should not come back up nicely in this scenario.  You need
"--force" to attempt to reassemble a failed array.

>>
>> Related, it would be nice if you could control how it handles lost
>> disks. E.g. I'd like it to go read-only when it goes in to fully
>> degraded mode. In case the last disk lost was only a temporary glitch,
>> the array could be made to recover without a lengthy resync.
>>

When you say "fully-degraded" do you mean "failed"?  In general the
bitmap mechanism provides fast resync after temporary disk outages.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/