linux-kernel - Re: Linux 2.6.29

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.00.0903300922231.3948@localhost.localdomain>
Date:	Mon, 30 Mar 2009 09:34:30 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Ric Wheeler <rwheeler@...hat.com>
cc:	"Andreas T.Auer" <andreas.t.auer_lkml_73537@...us.ath.cx>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Theodore Tso <tytso@....edu>, Mark Lord <lkml@....ca>,
	Stefan Richter <stefanr@...6.in-berlin.de>,
	Jeff Garzik <jeff@...zik.org>,
	Matthew Garrett <mjg59@...f.ucam.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

On Mon, 30 Mar 2009, Ric Wheeler wrote:
> 
> I still disagree strongly with the don't force flush idea - we have an
> absolute and critical need to have ordered writes that will survive a power
> failure for any file system that is built on transactions (or data base).

Read that sentence of yours again.

In particular, read the "we" part, and ponder.

YOU have that absolute and critical need.

Others? Likely not so much. The reason people run "data=ordered" on their 
laptops is not just because it's the default - rather, it's the default 
_because_ it's the one that avoids most obvious problems. And for 99% of 
all people, that's what they want. 

And as mentioned, if you have to have absolute requirements, you 
absolutely MUST be using real RAID with real protection (not just RAID0). 

Not "should". MUST. If you don't do redundancy, your disk _will_ 
eventually eat your data. Not because the OS wrote in the wrong order, or 
the disk cached writes, but simply because bad things do happen.

But turn that around, and say: if you don't have redundant disks, then 
pretty much by definition those drive flushes won't be guaranteeing your 
data _anyway_, so why pay the price?

> The big issues are that for s-ata drives, our flush mechanism is really,
> really primitive and brutal. We could/should try to validate a better and less
> onerous mechanism (with ordering tags? experimental flush ranges? etc).

That's one of the issues. The cost of those flushes can be really quite 
high, and as mentioned, in the absense of redundancy you don't actually 
get the guarantees that you seem to think that you get.

> I spent a very long time looking at huge numbers of installed systems
> (millions of file systems deployed in the field), including  taking part in
> weekly analysis of why things failed, whether the rates of failure went up or
> down with a given configuration, etc. so I can fully appreciate all of the
> ways drives (or SSD's!) can magically eat your data.

Well, I can go mainly by my own anecdotal evidence, and so far I've 
actually had more catastrophic data failure from failed drives than 
anything else. OS crashes in the middle of a "yum update"? Yup, been 
there, done that, it was really painful. But it was painful in a "damn, I 
need to force a re-install of a couple of rpms".

Actual failed drives that got read errors? I seem to average almost one a 
year. It's been overheating laptops, and it's been power outages that 
apparently happened at really bad times. I have a UPS now.

> What you have to keep in mind is the order of magnitude of various buckets of
> failures - software crashes/code bugs tend to dominate, followed by drive
> failures, followed by power supplies, etc.

Sure. And those "write flushes" really only cover a rather small 
percentage. For many setups, the other corruption issues (drive failure) 
are not just more common, but generally more disastrous anyway. So why 
would a person like that worry about the (rare) power failure?

> I have personally seen a huge reduction in the "software" rate of failures
> when you get the write barriers (forced write cache flushing) working properly
> with a very large installed base, tested over many years :-)

The software rate of failures should only care about the software write 
barriers (ie the ones that order the OS elevator - NOT the ones that 
actually tell the disk to flush itself).

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/