[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0908260416270.30426@asgard.lang.hm>
Date: Wed, 26 Aug 2009 04:28:00 -0700 (PDT)
From: david@...g.hm
To: Pavel Machek <pavel@....cz>
cc: Ric Wheeler <rwheeler@...hat.com>, Theodore Tso <tytso@....edu>,
Florian Weimer <fweimer@....de>,
Goswin von Brederlow <goswin-v-b@....de>,
Rob Landley <rob@...dley.net>,
kernel list <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
rdunlap@...otime.net, linux-doc@...r.kernel.org,
linux-ext4@...r.kernel.org, corbet@....net
Subject: Re: [patch] ext2/3: document conditions when reliable operation is
possible
On Wed, 26 Aug 2009, Pavel Machek wrote:
> On Wed 2009-08-26 06:39:14, Ric Wheeler wrote:
>> On 08/25/2009 10:58 PM, Theodore Tso wrote:
>>> On Tue, Aug 25, 2009 at 09:15:00PM -0400, Ric Wheeler wrote:
>>>
>>>> I agree with the whole write up outside of the above - degraded RAID
>>>> does meet this requirement unless you have a second (or third, counting
>>>> the split write) failure during the rebuild.
>>>>
>>> The argument is that if the degraded RAID array is running in this
>>> state for a long time, and the power fails while the software RAID is
>>> in the middle of writing out a stripe, such that the stripe isn't
>>> completely written out, we could lose all of the data in that stripe.
>>>
>>> In other words, a power failure in the middle of writing out a stripe
>>> in a degraded RAID array counts as a second failure.
>>> To me, this isn't a particularly interesting or newsworthy point,
>>> since a competent system administrator who cares about his data and/or
>>> his hardware will (a) have a UPS, and (b) be running with a hot spare
>>> and/or will imediately replace a failed drive in a RAID array.
>>
>> I agree that this is not an interesting (or likely) scenario, certainly
>> when compared to the much more frequent failures that RAID will protect
>> against which is why I object to the document as Pavel suggested. It
>> will steer people away from using RAID and directly increase their
>> chances of losing their data if they use just a single disk.
>
> So instead of fixing or at least documenting known software deficiency
> in Linux MD stack, you'll try to surpress that information so that
> people use more of raid5 setups?
>
> Perhaps the better documentation will push them to RAID1, or maybe
> make them buy an UPS?
people aren't objecting to better documentation, they are objecting to
misleading documentation.
for flash drives the danger is very straightforward (although even then
you have to note that it depends heavily on the firmware of the device,
some will loose lots of data, some won't loose any)
a good thing to do here would be for someone to devise a test to show this
problem, and then gather the results of lots of people performing this
test to see what the commonalities are.
you are generalizing that since you have lost data on flash drives, all
flash drives are dangerous.
what if it turns out that only one manufacturer is doing things wrong? you
will have discouraged people from using flash drives for no reason.
(potentially causing them to loose data becouse they ae scared away from
using flash drives and don't implement anything better)
to be safe, all that a flash drive needs to do is to not change the FTL
pointers until the data has fully been recorded in it's new location. this
is probably a trivial firmware change.
for raid arrays, we are still learning the nuances of what actually can
happen. the comment that Rik made a few hours ago when he pointed out that
with raid 5 you won't trash the entire stripe (which is what I thought
happened from prior comments), but instead run the risk of loosing two
relativly definable chunks of data
1. the block you are writing (which you can loose anyway)
2. the block that would live on the disk that is missing.
that drasticly lessens the impact of the problem
I would like to see someone explain what would happen on raid 6, and I
think that the possibilities that Neil talked about where he said that it
was possible to try the various combinations and see which ones agree with
each other would be a good thing to implement if he can do so.
but the super simplified statement you keep trying to make is
significantly overstating and oversimplifying the problem.
David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists