[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <E1JFLEW-0002oE-G1@localhost.localdomain>
Date: Thu, 17 Jan 2008 11:16:00 +0800
From: Fengguang Wu <wfg@...l.ustc.edu.cn>
To: David Chinner <dgc@....com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Michael Rubin <mrubin@...gle.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
On Thu, Jan 17, 2008 at 09:35:10AM +1100, David Chinner wrote:
> On Wed, Jan 16, 2008 at 05:07:20PM +0800, Fengguang Wu wrote:
> > On Tue, Jan 15, 2008 at 09:51:49PM -0800, Andrew Morton wrote:
> > > > Then to do better ordering by adopting radix tree(or rbtree
> > > > if radix tree is not enough),
> > >
> > > ordering of what?
> >
> > Switch from time to location.
>
> Note that data writeback may be adversely affected by location
> based writeback rather than time based writeback - think of
> the effect of location based data writeback on an app that
> creates lots of short term (<30s) temp files and then removes
> them before they are written back.
A small(e.g. 5s) time window can still be enforced, but...
> Also, data writeback locatio cannot be easily derived from
> the inode number in pretty much all cases. "near" in terms
> of XFS means the same AG which means the data could be up to
> a TB away from the inode, and if you have >1TB filesystems
> usingthe default inode32 allocator, file data is *never*
> placed near the inode - the inodes are in the first TB of
> the filesystem, the data is rotored around the rest of the
> filesystem.
>
> And with delayed allocation, you don't know where the data is even
> going to be written ahead of the filesystem ->writepage call, so you
> can't do optimal location ordering for data in this case.
Agreed.
> Hmmmm - I'm wondering if we'd do better to split data writeback from
> inode writeback. i.e. we do two passes. The first pass writes all
> the data back in time order, the second pass writes all the inodes
> back in location order.
>
> Right now we interleave data and inode writeback, (i.e. we do data,
> inode, data, inode, data, inode, ....). I'd much prefer to see all
> data written out first, then the inodes. ->writepage often dirties
> the inode and hence if we need to do multiple do_writepages() calls
> on an inode to flush all the data (e.g. congestion, large amounts of
> data to be written, etc), we really shouldn't be calling
> write_inode() after every do_writepages() call. The inode
> should not be written until all the data is written....
That may do good to XFS. Another case is documented as follows:
"the write_inode() function of a typical fs will perform no I/O, but
will mark buffers in the blockdev mapping as dirty."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists