[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090327032301.GN6239@mit.edu>
Date: Thu, 26 Mar 2009 23:23:01 -0400
From: Theodore Tso <tytso@....edu>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29
On Thu, Mar 26, 2009 at 06:03:15PM -0700, Linus Torvalds wrote:
>
> Everybody accepts that if you've written a 20MB file and then call
> "fsync()" on it, it's going to take a while. But when you've written a 2kB
> file, and "fsync()" takes 20 seconds, because somebody else is just
> writing normally, _that_ is a bug. And it is actually almost totally
> unrelated to the whole 'dirty_limit' thing.
Yeah, well, it's caused by data=ordered, which is an ext3 unique
thing; no other filesystem (or operating system) has such a feature.
I'm beginning to wish we hadn't implemented it. Yeah, it solved a
security problem (which delayed allocation also solves), but it
trained application programs to be careless about fsync(), and it's
caused us so many other problems, including the fsync() and unrelated
commit latency problems.
We are where we are, though, and people have been trained to think
they don't need fsync(), so we're going to have to deal with the
problem by having these implied fsync for cases like
replace-via-rename, and in addition to that, some kind of hueristic to
force out writes early to avoid these huge write latencies. It would
be good to make it be autotuning it so that filesystems that don't do
ext3 data=ordered don't have to pay the price of having to force out
writes so aggressively early (since in some cases if the file
subsequently is deleted, we might be able to optimize out the write
altogether --- and that's good for SSD endurance).
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists