lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 30 Aug 2009 12:10:54 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	Michael Tokarev <mjt@....msk.ru>
CC:	david@...g.hm, Pavel Machek <pavel@....cz>,
	Theodore Tso <tytso@....edu>, NeilBrown <neilb@...e.de>,
	Rob Landley <rob@...dley.net>, Florian Weimer <fweimer@....de>,
	Goswin von Brederlow <goswin-v-b@....de>,
	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
	rdunlap@...otime.net, linux-doc@...r.kernel.org,
	linux-ext4@...r.kernel.org, corbet@....net
Subject: Re: raid is dangerous but that's secret (was Re: [patch] ext2/3:
 document conditions when reliable operation is possible)

On 08/30/2009 10:44 AM, Michael Tokarev wrote:
> Ric Wheeler wrote:
> []
>> The easiest way to lose your data in Linux - with RAID, without RAID, 
>> S-ATA or SAS - is to run with the write cache enabled.
>>
>> If you compare the size of even a large RAID stripe it will be 
>> measured in KB and as this thread has mentioned already, you stand to 
>> have damage to just one stripe (or even just a disk sector or two).
>>
>> If you lose power with the write caches enabled on that same 5 drive 
>> RAID set, you could lose as much as 5 * 32MB of freshly written data 
>> on  a power loss (16-32MB write caches are common on s-ata disks 
>> these days).
>
> This is fundamentally wrong.  Many filesystems today use either barriers
> or flushes (if barriers are not supported), and the times when disk 
> drives
> were lying to the OS that the cache got flushed are long gone.
Unfortunately not - if you mount a file system with write cache enabled 
and see "barriers disabled" messages in /var/log/messages, this is 
exactly what happens.

File systems issue write barrier operations that in turn do cache 
flushes (ATA_FLUSH_EXT) commands or its SCSI  equivalent.

MD5 and MD6 do not pass these operations on currently and there is no 
other file system level mechanism that somehow bypasses the IO stack to 
invalidate or flush the cache.

Note that some devices have non-volatile write caches (specifically 
arrays or battery backed RAID cards) where this is not an issue.


>
>> For MD5 (and MD6), you really must run with the write cache disabled 
>> until we get barriers to work for those configurations.
>
> I highly doubt barriers will ever be supported on anything but simple
> raid1, because it's impossible to guarantee ordering across multiple
> drives.  Well, it *is* possible to have write barriers with journalled
> (and/or with battery-backed-cache) raid[456].
>
> Note that even if raid[456] does not support barriers, write cache
> flushes still works.
>
> /mjt

I think that you are confused - barriers are implemented using cache 
flushes.

Ric


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists