[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070427150703.293748d0.akpm@linux-foundation.org>
Date: Fri, 27 Apr 2007 15:07:03 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Zan Lynx <zlynx@....org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Mike Galbraith <efault@....de>,
LKML <linux-kernel@...r.kernel.org>,
Jens Axboe <jens.axboe@...cle.com>
Subject: Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS
is under heavy write load (massive starvation)
On Fri, 27 Apr 2007 13:09:06 -0600
Zan Lynx <zlynx@....org> wrote:
> On Fri, 2007-04-27 at 11:31 -0700, Andrew Morton wrote:
> [snip]
> > ext3's problem here is that a single fsync() requires that ext3 sync the
> > whole filesystem. Because
> >
> > - a journal commit can contain metadata from multiple files, and if we
> > want to journal one file's metadata via fsync(), we unavoidably journal
> > all the other file's metadata at the same time.
> >
> > - ordered mode requires that we write a file's data blocks prior to
> > journalling the metadata which refers to those blocks.
> >
> > net result: syncing anything syncs the whole world.
> >
> > There are a few areas in which this could conceivably be tuned up: if a
> > particular file doesn't currently have any metadata in the commit, we don't
> > actually need to sync its data blocks: we could just transfer them into
> > next commit. Hard, unlikely to be of benefit.
> [snip]
>
> How about mixing the ordered and data journal modes? If the data blocks
> would fit, have fsync write them into the journal as is done in
> data=journal mode. Then that file data is committed to disk as fsync
> requires, but it shouldn't require flushing all the previous metadata to
> get an ordered guarantee.
In some ways that would be quite neat: if a process does a small write then
fsyncs it, write it all into the journal. That avoids a seek out to the
file's data blocks.
However it'd be quite hard to do, I expect: we don't know until commit time
how much data has been written to this file (actually, we don't even know
at commit-time, but we could, with quite some work, find out).
But none of this will solve the problem, because even with your optimised
fsync(), we still need to write out bonnie's large file at commit time,
when we fsync() your small write to a different file.
(And when I say "this problem" I refer to the known-about problem which
we're discussing here. I suspect this in fact isn't Mike's problem - 20
minutes is crazy - it's not attributable to the fsync-syncs-everything
problem unless Mike's GUI is doing a huge numer of separate fsyncs)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists