linux-ext4 - raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090827221319.GA1601@ucw.cz>
Date:	Fri, 28 Aug 2009 00:13:21 +0200
From:	Pavel Machek <pavel@....cz>
To:	Ric Wheeler <rwheeler@...hat.com>
Cc:	Rob Landley <rob@...dley.net>, Theodore Tso <tytso@....edu>,
	Florian Weimer <fweimer@....de>,
	Goswin von Brederlow <goswin-v-b@....de>,
	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
	rdunlap@...otime.net, linux-doc@...r.kernel.org,
	linux-ext4@...r.kernel.org, corbet@....net
Subject: raid is dangerous but that's secret (was Re: [patch] ext2/3:
	document conditions when reliable operation is possible)


>>> Repeat experiment until you get up to something like google scale or the
>>> other papers on failures in national labs in the US and then we can have an
>>> informed discussion.
>>>      
>> On google scale anvil lightning can fry your machine out of a clear sky.
>>
>> However, there are still a few non-enterprise users out there, and knowing
>> that specific usage patterns don't behave like they expect might be useful to
>> them.
>
> You are missing the broader point of both papers. They (and people like  
> me when back at EMC) look at large numbers of machines and try to fix  
> what actually breaks when run in the real world and causes data loss.  
> The motherboards, S-ATA controllers, disk types are the same class of  
> parts that I have in my desktop box today.
...
> These errors happen extremely commonly and are what RAID deals with well.
>
> What does not happen commonly is that during the RAID rebuild (kicked  
> off only after a drive is kicked out), you push the power button or have  
> a second failure (power outage).
>
> We will have more users loose data if they decide to use ext2 instead of  
> ext3 and use only single disk storage.

So your argument basically is

'our abs brakes are broken, but lets not tell anyone; our car is still
safer than a horse'.

and

'while we know our abs brakes are broken, they are not major factor in
accidents, so lets not tell anyone'.

Sorry, but I'd expect slightly higher moral standards. If we can
document it in a way that's non-scary, and does not push people to
single disks (horses), please go ahead; but you have to mention that
md raid breaks journalling assumptions (our abs brakes really are
broken).
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html