[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49CECC7B.70100@garzik.org>
Date: Sat, 28 Mar 2009 21:18:51 -0400
From: Jeff Garzik <jeff@...zik.org>
To: Mark Lord <lkml@....ca>
CC: Stefan Richter <stefanr@...6.in-berlin.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Matthew Garrett <mjg59@...f.ucam.org>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Theodore Tso <tytso@....edu>,
Andrew Morton <akpm@...ux-foundation.org>,
David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29
Mark Lord wrote:
> The better solution seems to be the rather obvious one:
>
> the filesystem should commit data to disk before altering metadata.
>
> Much easier and more reliable to centralize it there, rather than
> rely (falsely) upon thousands of programs each performing numerous
> performance-killing fsync's.
Firstly, the FS data/metadata write-out order says nothing about when
the write-out is started by the OS. It only implies consistency in the
face of a crash during write-out. Hooray for BSD soft-updates.
If the write-out is started immediately during or after write(2),
congratulations, you are on your way to reinventing synchronous writes.
If the write-out does not start immediately, then you have a
many-seconds window for data loss. And it should be self-evident that
userland application writers will have some situations where design
requirements dictate minimizing or eliminating that window.
Secondly, this email sub-thread is not talking about thousands of
programs, it is talking about Firefox behavior. Firefox is a multi-OS
portable application that has a design requirement that user data must
be protected against crashes. (same concept as your word processor's
auto-save feature)
The author of such a portable application must ensure their app saves
data against Windows Vista kernel crashes, HPUX kernel crashes, OS X
window system crashes, X11 window system crashes, application crashes, etc.
Can a portable app really rely on what Linux kernel hackers think the
underlying filesystem _should_ do?
No, it is either (a) not going to care at all, or (b) uses fsync(2) or
FlushFileBuffers() because if guarantees provided across the OS
spectrum, in light of the myriad OS filesystem caching, flushing, and
ordering algorithms.
Was the BSD soft-updates idea of FS data-before-metadata a good one?
Yes. Obviously.
It is the cornerstone of every SANE journalling-esque database or
filesystem out there -- don't leave a window where your metadata is
inconsistent. "Duh" :)
But that says nothing about when a userland app's design requirements
include ordered writes+flushes of its own application data. That is the
common case when a userland app like Firefox uses a transactional
database such as sqlite or db4.
Thus it is the height of silliness to think that FS data/metadata
write-out order permits elimination of fsync(2) for the class of
application that must care about ordered writes/flushes of its own
application data.
That upstream sqlite replaced fsync(2) with fdatasync(2) makes it
obvious that FS data/metadata write-out order is irrelevant to Firefox.
The issue with transactional databases is more simply a design tradeoff
-- level of fsync punishment versus performance etc. Tweaking the OS
filesystem doesn't help at all with those design choices.
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists