[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091231050441.GB19627@localhost>
Date: Thu, 31 Dec 2009 13:04:41 +0800
From: Wu Fengguang <fengguang.wu@...el.com>
To: Trond Myklebust <Trond.Myklebust@...app.com>
Cc: Jan Kara <jack@...e.cz>, Steve Rago <sar@...-labs.com>,
Peter Zijlstra <peterz@...radead.org>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jens.axboe" <jens.axboe@...cle.com>,
Peter Staubach <staubach@...hat.com>,
Arjan van de Ven <arjan@...radead.org>,
Ingo Molnar <mingo@...e.hu>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] improve the performance of large sequential write NFS
workloads
Trond,
On Thu, Dec 31, 2009 at 12:22:48AM +0800, Trond Myklebust wrote:
> it ignores the commit request if the caller is just doing a
> WB_SYNC_NONE background flush, waiting instead for the ensuing
> WB_SYNC_ALL request...
I'm afraid this will block balance_dirty_pages() until explicit
sync/fsync calls: COMMITs are bad, however if we don't send them
regularly, NR_UNSTABLE_NFS will grow large and block
balance_dirty_pages() as well as throttle_vm_writeout()..
> +int nfs_commit_unstable_pages(struct address_space *mapping,
> + struct writeback_control *wbc)
> +{
> + struct inode *inode = mapping->host;
> + int flags = FLUSH_SYNC;
> + int ret;
> +
==> > + /* Don't commit if this is just a non-blocking flush */
==> > + if (wbc->sync_mode != WB_SYNC_ALL) {
==> > + mark_inode_unstable_pages(inode);
==> > + return 0;
==> > + }
> + if (wbc->nonblocking)
> + flags = 0;
> + ret = nfs_commit_inode(inode, flags);
> + if (ret > 0)
> + return 0;
> + return ret;
> +}
The NFS protocol provides no painless way to reclaim unstable pages
other than the COMMIT (or sync write).. This leaves us in a dilemma.
We may reasonably reduce the number of COMMITs, and possibly even
delay them for a while (and hope the server have writeback the pages
before the COMMIT, somehow fragile).
What we can obviously do is to avoid sending a COMMIT
- if there are already an ongoing COMMIT for the same inode
- or when there are ongoing WRITE for the inode
(are there easy way to detect this?)
What do you think?
Thanks,
Fengguang
---
fs/nfs/inode.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- linux.orig/fs/nfs/inode.c 2009-12-25 09:25:38.000000000 +0800
+++ linux/fs/nfs/inode.c 2009-12-25 10:13:06.000000000 +0800
@@ -105,8 +105,11 @@ int nfs_write_inode(struct inode *inode,
ret = filemap_fdatawait(inode->i_mapping);
if (ret == 0)
ret = nfs_commit_inode(inode, FLUSH_SYNC);
- } else
+ } else if (!radix_tree_tagged(&NFS_I(inode)->nfs_page_tree,
+ NFS_PAGE_TAG_LOCKED))
ret = nfs_commit_inode(inode, 0);
+ else
+ ret = -EAGAIN;
if (ret >= 0)
return 0;
__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists