lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 30 Mar 2009 14:08:43 -0500
From:	Eric Sandeen <sandeen@...deen.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Ric Wheeler <rwheeler@...hat.com>,
	"Andreas T.Auer" <andreas.t.auer_lkml_73537@...us.ath.cx>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Theodore Tso <tytso@....edu>, Mark Lord <lkml@....ca>,
	Stefan Richter <stefanr@...6.in-berlin.de>,
	Jeff Garzik <jeff@...zik.org>,
	Matthew Garrett <mjg59@...f.ucam.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

Linus Torvalds wrote:
> 
> On Mon, 30 Mar 2009, Ric Wheeler wrote:
>>> But turn that around, and say: if you don't have redundant disks, then
>>> pretty much by definition those drive flushes won't be guaranteeing your
>>> data _anyway_, so why pay the price?
>> They do in fact provide that promise for the extremely common case of power
>> outage and as such, can be used to build reliable storage if you need to.
> 
> No they really effectively don't. Not if the end result is "oops, the 
> whole track is now unreadable" (regardless of whether it happened due to a 
> write durign power-out or during some entirely unrelated disk error). Your 
> "flush" didn't result in a stable filesystem at all, it just resulted in a 
> dead one.
> 
> That's my point. Disks simply aren't that reliable. Anything you do with 
> flushing and ordering won't make them magically not have errors any more.

But this is apples and oranges isn't it?

All of the effort that goes into metadata journalling in ext3, ext4,
xfs, reiserfs, jfs ... is to save us from the fsck time on restart, and
ensure a consistent filesystem framework (metadata, that is, in
general), after an unclean shutdown.  That could be due to a system
crash or a power outage.  This is much more common in my personal
experience than a drive failure.

That journalling requires ordering guarantees, and with large drive
write caches, and no ordering, it's not hard for it to go south to the
point where things *do* get corrupted when you lose power or the drive
resets in the middle of basically random write cache destaging.  See
Chris Mason's tests from a year or so ago, proving that ext3 is quite
vulnerable to this - it likely explains some of the random htree
corruption that occasionally gets reported to us.

And yes, sometimes drives die, and then you are really screwed, but
that's orthogonal to all of the above, I think.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ