linux-kernel - Re: [PATCH] improve the performance of large sequential write NFS workloads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20091224025228.GA12477@localhost>
Date:	Thu, 24 Dec 2009 10:52:28 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Trond Myklebust <Trond.Myklebust@...app.com>
Cc:	Jan Kara <jack@...e.cz>, Steve Rago <sar@...-labs.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"jens.axboe" <jens.axboe@...cle.com>,
	Peter Staubach <staubach@...hat.com>,
	Arjan van de Ven <arjan@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] improve the performance of large sequential write NFS
	workloads

Trond,

On Thu, Dec 24, 2009 at 03:12:54AM +0800, Trond Myklebust wrote:
> On Wed, 2009-12-23 at 19:05 +0100, Jan Kara wrote: 
> > On Wed 23-12-09 15:21:47, Trond Myklebust wrote:
> > > @@ -474,6 +482,18 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
> > >  	}
> > >  
> > >  	spin_lock(&inode_lock);
> > > +	/*
> > > +	 * Special state for cleaning NFS unstable pages
> > > +	 */
> > > +	if (inode->i_state & I_UNSTABLE_PAGES) {
> > > +		int err;
> > > +		inode->i_state &= ~I_UNSTABLE_PAGES;
> > > +		spin_unlock(&inode_lock);
> > > +		err = commit_unstable_pages(inode, wait);
> > > +		if (ret == 0)
> > > +			ret = err;
> > > +		spin_lock(&inode_lock);
> > > +	}
> >   I don't quite understand this chunk: We've called writeback_single_inode
> > because it had some dirty pages. Thus it has I_DIRTY_DATASYNC set and a few
> > lines above your chunk, we've called nfs_write_inode which sent commit to
> > the server. Now here you sometimes send the commit again? What's the
> > purpose?
> 
> We no longer set I_DIRTY_DATASYNC. We only set I_DIRTY_PAGES (and later
> I_UNSTABLE_PAGES).
> 
> The point is that we now do the commit only _after_ we've sent all the
> dirty pages, and waited for writeback to complete, whereas previously we
> did it in the wrong order.

Sorry I still don't get it. The timing used to be:

write 4MB   ==> WRITE block 0 (ie. first 512KB)
                WRITE block 1
                WRITE block 2
                WRITE block 3         ack from server for WRITE block 0 => mark 0 as unstable (inode marked need-commit)
                WRITE block 4         ack from server for WRITE block 1 => mark 1 as unstable
                WRITE block 5         ack from server for WRITE block 2 => mark 2 as unstable
                WRITE block 6         ack from server for WRITE block 3 => mark 3 as unstable
                WRITE block 7         ack from server for WRITE block 4 => mark 4 as unstable
                                      ack from server for WRITE block 5 => mark 5 as unstable
write_inode ==> COMMIT blocks 0-5
                                      ack from server for WRITE block 6 => mark 6 as unstable (inode marked need-commit)
                                      ack from server for WRITE block 7 => mark 7 as unstable 

                                      ack from server for COMMIT blocks 0-5 => mark 0-5 as clean

write_inode ==> COMMIT blocks 6-7

                                      ack from server for COMMIT blocks 6-7 => mark 6-7 as clean

Note that the first COMMIT is submitted before receiving all ACKs for
the previous writes, hence the second COMMIT is necessary. It seems
that your patch does not improve the timing at all.

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/