[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A9485A6.1010803@redhat.com>
Date: Tue, 25 Aug 2009 20:45:26 -0400
From: Ric Wheeler <rwheeler@...hat.com>
To: Pavel Machek <pavel@....cz>
CC: david@...g.hm, Theodore Tso <tytso@....edu>,
Florian Weimer <fweimer@....de>,
Goswin von Brederlow <goswin-v-b@....de>,
Rob Landley <rob@...dley.net>,
kernel list <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
rdunlap@...otime.net, linux-doc@...r.kernel.org,
linux-ext4@...r.kernel.org, corbet@....net
Subject: Re: [patch] document flash/RAID dangers
On 08/25/2009 08:38 PM, Pavel Machek wrote:
>>>>> I'm not sure what's rare about power failures. Unlike single sector
>>>>> errors, my machine actually has a button that produces exactly that
>>>>> event. Running degraded raid5 arrays for extended periods may be
>>>>> slightly unusual configuration, but I suspect people should just do
>>>>> that for testing. (And from the discussion, people seem to think that
>>>>> degraded raid5 is equivalent to raid0).
>>>>
>>>> Power failures after a full drive failure with a split write during a rebuild?
>>>
>>> Look, I don't need full drive failure for this to happen. I can just
>>> remove one disk from array. I don't need power failure, I can just
>>> press the power button. I don't even need to rebuild anything, I can
>>> just write to degraded array.
>>>
>>> Given that all events are under my control, statistics make little
>>> sense here.
>>
>> You are deliberately causing a double failure - pressing the power button
>> after pulling a drive is exactly that scenario.
>
> Exactly. And now I'm trying to get that documented, so that people
> don't do it and still expect their fs to be consistent.
The problem I have is that the way you word it steers people away from RAID5 and
better data integrity. Your intentions are good, but your text is going to do
considerable harm.
Most people don't intentionally drop power (or have a power failure) during RAID
rebuilds....
>
>> Pull your single (non-MD5) disk out while writing (hot unplug from the
>> S-ATA side, leaving power on) and run some tests to verify your
>> assertions...
>
> I actually did that some time ago with pulling SATA disk (I actually
> pulled both SATA *and* power -- that was the way hotplug envelope
> worked; that's more harsh test than what you suggest, so that should
> be ok). Write test was fsync heavy, with logging to separate drive,
> checking that all the data where fsync succeeded are indeed
> accessible. I uncovered few bugs in ext* that jack fixed, I uncovered
> some libata weirdness that is not yet fixed AFAIK, but with all the
> patches applied I could not break that single SATA disk.
> Pavel
Fsync heavy workloads with working barriers will tend to keep the write cache
pretty empty (two barrier flushes per fsync) so this is not too surprising.
Drive behaviour depends on a lot of things though - how the firmware prioritizes
writes over reads, etc.
ric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists