linux-kernel - Re: [PATCH] improve the performance of large sequential write NFS workloads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1261656281.3596.1.camel@localhost>
Date:	Thu, 24 Dec 2009 13:04:41 +0100
From:	Trond Myklebust <Trond.Myklebust@...app.com>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Jan Kara <jack@...e.cz>, Steve Rago <sar@...-labs.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"jens.axboe" <jens.axboe@...cle.com>,
	Peter Staubach <staubach@...hat.com>,
	Arjan van de Ven <arjan@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] improve the performance of large sequential write NFS
 workloads

On Thu, 2009-12-24 at 10:52 +0800, Wu Fengguang wrote: 
> Trond,
> 
> On Thu, Dec 24, 2009 at 03:12:54AM +0800, Trond Myklebust wrote:
> > On Wed, 2009-12-23 at 19:05 +0100, Jan Kara wrote: 
> > > On Wed 23-12-09 15:21:47, Trond Myklebust wrote:
> > > > @@ -474,6 +482,18 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
> > > >  	}
> > > >  
> > > >  	spin_lock(&inode_lock);
> > > > +	/*
> > > > +	 * Special state for cleaning NFS unstable pages
> > > > +	 */
> > > > +	if (inode->i_state & I_UNSTABLE_PAGES) {
> > > > +		int err;
> > > > +		inode->i_state &= ~I_UNSTABLE_PAGES;
> > > > +		spin_unlock(&inode_lock);
> > > > +		err = commit_unstable_pages(inode, wait);
> > > > +		if (ret == 0)
> > > > +			ret = err;
> > > > +		spin_lock(&inode_lock);
> > > > +	}
> > >   I don't quite understand this chunk: We've called writeback_single_inode
> > > because it had some dirty pages. Thus it has I_DIRTY_DATASYNC set and a few
> > > lines above your chunk, we've called nfs_write_inode which sent commit to
> > > the server. Now here you sometimes send the commit again? What's the
> > > purpose?
> > 
> > We no longer set I_DIRTY_DATASYNC. We only set I_DIRTY_PAGES (and later
> > I_UNSTABLE_PAGES).
> > 
> > The point is that we now do the commit only _after_ we've sent all the
> > dirty pages, and waited for writeback to complete, whereas previously we
> > did it in the wrong order.
> 
> Sorry I still don't get it. The timing used to be:
> 
> write 4MB   ==> WRITE block 0 (ie. first 512KB)
>                 WRITE block 1
>                 WRITE block 2
>                 WRITE block 3         ack from server for WRITE block 0 => mark 0 as unstable (inode marked need-commit)
>                 WRITE block 4         ack from server for WRITE block 1 => mark 1 as unstable
>                 WRITE block 5         ack from server for WRITE block 2 => mark 2 as unstable
>                 WRITE block 6         ack from server for WRITE block 3 => mark 3 as unstable
>                 WRITE block 7         ack from server for WRITE block 4 => mark 4 as unstable
>                                       ack from server for WRITE block 5 => mark 5 as unstable
> write_inode ==> COMMIT blocks 0-5
>                                       ack from server for WRITE block 6 => mark 6 as unstable (inode marked need-commit)
>                                       ack from server for WRITE block 7 => mark 7 as unstable 
> 
>                                       ack from server for COMMIT blocks 0-5 => mark 0-5 as clean
> 
> write_inode ==> COMMIT blocks 6-7
> 
>                                       ack from server for COMMIT blocks 6-7 => mark 6-7 as clean
> 
> Note that the first COMMIT is submitted before receiving all ACKs for
> the previous writes, hence the second COMMIT is necessary. It seems
> that your patch does not improve the timing at all.

That would indicate that we're cycling through writeback_single_inode()
more than once. Why?

Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/