linux-kernel - Re: raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4A9733C1.2070904@redhat.com>
Date:	Thu, 27 Aug 2009 21:32:49 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	Pavel Machek <pavel@....cz>
CC:	Rob Landley <rob@...dley.net>, Theodore Tso <tytso@....edu>,
	Florian Weimer <fweimer@....de>,
	Goswin von Brederlow <goswin-v-b@....de>,
	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
	rdunlap@...otime.net, linux-doc@...r.kernel.org,
	linux-ext4@...r.kernel.org, corbet@....net
Subject: Re: raid is dangerous but that's secret (was Re: [patch] ext2/3:
 document conditions when reliable operation is possible)

On 08/27/2009 06:13 PM, Pavel Machek wrote:
>
>>>> Repeat experiment until you get up to something like google scale or the
>>>> other papers on failures in national labs in the US and then we can have an
>>>> informed discussion.
>>>>
>>> On google scale anvil lightning can fry your machine out of a clear sky.
>>>
>>> However, there are still a few non-enterprise users out there, and knowing
>>> that specific usage patterns don't behave like they expect might be useful to
>>> them.
>>
>> You are missing the broader point of both papers. They (and people like
>> me when back at EMC) look at large numbers of machines and try to fix
>> what actually breaks when run in the real world and causes data loss.
>> The motherboards, S-ATA controllers, disk types are the same class of
>> parts that I have in my desktop box today.
> ...
>> These errors happen extremely commonly and are what RAID deals with well.
>>
>> What does not happen commonly is that during the RAID rebuild (kicked
>> off only after a drive is kicked out), you push the power button or have
>> a second failure (power outage).
>>
>> We will have more users loose data if they decide to use ext2 instead of
>> ext3 and use only single disk storage.
>
> So your argument basically is
>
> 'our abs brakes are broken, but lets not tell anyone; our car is still
> safer than a horse'.
>
> and
>
> 'while we know our abs brakes are broken, they are not major factor in
> accidents, so lets not tell anyone'.
>
> Sorry, but I'd expect slightly higher moral standards. If we can
> document it in a way that's non-scary, and does not push people to
> single disks (horses), please go ahead; but you have to mention that
> md raid breaks journalling assumptions (our abs brakes really are
> broken).
> 								Pavel
>


You continue to ignore the technical facts that everyone (both MD and ext3) 
people put in front of you.

If you have a specific bug in MD code, please propose a patch.

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/