[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091223180551.GD3159@quack.suse.cz>
Date: Wed, 23 Dec 2009 19:05:51 +0100
From: Jan Kara <jack@...e.cz>
To: Trond Myklebust <Trond.Myklebust@...app.com>
Cc: Wu Fengguang <fengguang.wu@...el.com>,
Steve Rago <sar@...-labs.com>,
Peter Zijlstra <peterz@...radead.org>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jens.axboe" <jens.axboe@...cle.com>,
Peter Staubach <staubach@...hat.com>, Jan Kara <jack@...e.cz>,
Arjan van de Ven <arjan@...radead.org>,
Ingo Molnar <mingo@...e.hu>, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] improve the performance of large sequential write NFS
workloads
On Wed 23-12-09 15:21:47, Trond Myklebust wrote:
> On Tue, 2009-12-22 at 09:59 +0800, Wu Fengguang wrote:
> > 1) normal fs sets I_DIRTY_DATASYNC when extending i_size, however NFS
> > will set the flag for any pages written -- why this trick? To
> > guarantee the call of nfs_commit_inode()? Which unfortunately turns
> > almost every server side NFS write into sync writes..
> >
> > writeback_single_inode:
> > do_writepages
> > nfs_writepages
> > nfs_writepage ----[short time later]---> nfs_writeback_release*
> > nfs_mark_request_commit
> > __mark_inode_dirty(I_DIRTY_DATASYNC);
> >
> > if (I_DIRTY_SYNC || I_DIRTY_DATASYNC) <---- so this will be true for most time
> > write_inode
> > nfs_write_inode
> > nfs_commit_inode
>
>
> I have been working on a fix for this. We basically do want to ensure
> that NFS calls commit (otherwise we're not finished cleaning the dirty
> pages), but we want to do it _after_ we've waited for all the writes to
> complete. See below...
>
> Trond
>
> ------------------------------------------------------------------------------------------------------
> VFS: Add a new inode state: I_UNSTABLE_PAGES
>
> From: Trond Myklebust <Trond.Myklebust@...app.com>
>
> Add a new inode state to enable the vfs to commit the nfs unstable pages to
> stable storage once the write back of dirty pages is done.
Hmm, does your patch really help?
> @@ -474,6 +482,18 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
> }
>
> spin_lock(&inode_lock);
> + /*
> + * Special state for cleaning NFS unstable pages
> + */
> + if (inode->i_state & I_UNSTABLE_PAGES) {
> + int err;
> + inode->i_state &= ~I_UNSTABLE_PAGES;
> + spin_unlock(&inode_lock);
> + err = commit_unstable_pages(inode, wait);
> + if (ret == 0)
> + ret = err;
> + spin_lock(&inode_lock);
> + }
I don't quite understand this chunk: We've called writeback_single_inode
because it had some dirty pages. Thus it has I_DIRTY_DATASYNC set and a few
lines above your chunk, we've called nfs_write_inode which sent commit to
the server. Now here you sometimes send the commit again? What's the
purpose?
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index faa0918..4f129b3 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -99,17 +99,14 @@ u64 nfs_compat_user_ino64(u64 fileid)
>
> int nfs_write_inode(struct inode *inode, int sync)
> {
> + int flags = 0;
> int ret;
>
> - if (sync) {
> - ret = filemap_fdatawait(inode->i_mapping);
> - if (ret == 0)
> - ret = nfs_commit_inode(inode, FLUSH_SYNC);
> - } else
> - ret = nfs_commit_inode(inode, 0);
> - if (ret >= 0)
> + if (sync)
> + flags = FLUSH_SYNC;
> + ret = nfs_commit_inode(inode, flags);
> + if (ret > 0)
> return 0;
> - __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> return ret;
> }
>
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists