[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0903280916230.3994@localhost.localdomain>
Date: Sat, 28 Mar 2009 09:32:36 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Stefan Richter <stefanr@...6.in-berlin.de>
cc: Mark Lord <lkml@....ca>, Jeff Garzik <jeff@...zik.org>,
Matthew Garrett <mjg59@...f.ucam.org>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Theodore Tso <tytso@....edu>,
Andrew Morton <akpm@...ux-foundation.org>,
David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29
On Sat, 28 Mar 2009, Stefan Richter wrote:
>
> Sure. I forgot: Not only the frequency of I/O disruption (e.g. due to
> kernel crash) factors into system reliability; the particular impact of
> such disruption is a factor too. (How hard is recovery? Will at least
> old data remain available? ...)
I suspect (at least from my own anecdotal evidence) that a lot of system
crashes are basically X hanging. If you use the system as a desktop, at
that point it's basically dead - and the difference between an X hang and
a kernel crash is almost totally invisible to users.
Us kernel people may walk over to another machine and ping or ssh in to
see, but ask yourself how many normal users would do that - especially
since DOS and Windows has taught people that they need to power-cycle
(and, in all honesty, especially since there usually is very little else
you can do even under Linux if X gets confused).
And then part of the problem ends up being that while in theory the kernel
can continue to write out dirty stuff, in practice people press the power
button long before it can do so. The 30 second thing is really too long.
And don't tell me about sysrq. I know about sysrq. It's very convenient
for kernel people, but it's not like most people use it.
But I absolutely hear you - people seem to think that "correctness" trumps
all, but in reality, quite often users will be happier with a faster
system - even if they know that they may lose data. They may curse
themselves (or, more likely, the system) when they _do_ lose data, but
they'll make the same choice all over two months later.
Which is why I think that if the filesystem people think that the
"data=ordered" mode is too damn fundamentally hard to make fast in the
presense of "fsync", and all sane people (definition: me) think that the
30-second window for either "data=writeback" or the ext4 data writeout is
too fragile, then we should look into something in between.
Because, in the end, you do have to balance performance vs safety when it
comes to disk writes. You absolutely have to delay things for performance,
but it is always going to involve the risk of losing data that you do care
about, but that you aren't willing (or able - random apps and tons of
scripting comes to mind) to do a fsync over.
Which is why I, personally, would probably be perfectly happy with a
"async ordered" mode, for example. At least START the data writeback when
writing back metadata, but don't necessarily wait for it (and don't
necessarily make it go first). Turn the "30 second window of death" into
something much harder to hit.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists