[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090325220530.GR32307@mit.edu>
Date: Wed, 25 Mar 2009 18:05:30 -0400
From: Theodore Tso <tytso@....edu>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29
On Wed, Mar 25, 2009 at 11:40:28AM -0700, Linus Torvalds wrote:
> On Wed, 25 Mar 2009, Theodore Tso wrote:
> > I'm beginning to think that using a "ratio" may be the wrong way to
> > go. We probably need to add an optional dirty_max_megabytes field
> > where we start pushing dirty blocks out when the number of dirty
> > blocks exceeds either the dirty_ratio or the dirty_max_megabytes,
> > which ever comes first.
>
> We have that. Except it's called "dirty_bytes" and
> "dirty_background_bytes", and it defaults to zero (off).
>
> The problem being that unlike the ratio, there's no sane default value
> that you can at least argue is not _entirely_ pointless.
Well, if the maximum time that someone wants to wait for an fsync() to
return is one second, and the RAID array can write 100MB/sec, then
setting a value of 100MB makes a certain amount of sense. Yes, this
doesn't take seek overheads into account, and it may be that we're not
writing things out in an optimal order, as Alan as pointed out. But
100MB is much lower number than 5% of 32GB (1.6GB). It would be
better if these numbers were accounted on a per-filesystem instead of
a global threshold, but for people who are complaining about huge
latencies, it at least a partial workaround that they can use today.
I agree, it's not perfect, but this is a fundamentally hard problem.
We have multiple solutions, such as ext4 and XFS's delayed allocation,
which some people don't like because applications aren't calling
fsync(). We can boost the I/O priority of kjournald which definitely
helps, as Arjan has suggested, but Andrew has vetoed that. I have a
patch which hopefully is less controversial, that posts writes using
WRITE_SYNC instead of WRITE, but which only will help in some
circumstances, but not in the distcc/icecream/fast downloads
scnearios. We can use data=writeback, but folks don't like the
security implications of that.
People can call file system developers idiots if it makes them feel
better --- sure, OK, we all suck. If someone wants to try to create a
better file system, show us how to do better, or send us some patches.
But this is not a problem that's easy to solve in a way that's going
to make everyone happy; else it would have been solved already.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists