linux-kernel - Re: [PATCH 29/35] nfs: in-commit pages accounting and wait queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1292342245.2976.13.camel@heimdal.trondhjem.org>
Date:	Tue, 14 Dec 2010 10:57:25 -0500
From:	Trond Myklebust <Trond.Myklebust@...app.com>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, Jan Kara <jack@...e.cz>,
	Christoph Hellwig <hch@....de>,
	Dave Chinner <david@...morbit.com>,
	"Theodore Ts'o" <tytso@....edu>,
	Chris Mason <chris.mason@...cle.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mel Gorman <mel@....ul.ie>, Rik van Riel <riel@...hat.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Greg Thelen <gthelen@...gle.com>,
	Minchan Kim <minchan.kim@...il.com>,
	linux-mm <linux-mm@...ck.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 29/35] nfs: in-commit pages accounting and wait queue

On Tue, 2010-12-14 at 23:40 +0800, Wu Fengguang wrote:
> On Tue, Dec 14, 2010 at 05:15:51AM +0800, Trond Myklebust wrote:
> > On Mon, 2010-12-13 at 22:47 +0800, Wu Fengguang wrote:
> > > plain text document attachment (writeback-nfs-in-commit.patch)
> > > When doing 10+ concurrent dd's, I observed very bumpy commits submission
> > > (partly because the dd's are started at the same time, and hence reached
> > > 4MB to-commit pages at the same time). Basically we rely on the server
> > > to complete and return write/commit requests, and want both to progress
> > > smoothly and not consume too many pages. The write request wait queue is
> > > not enough as it's mainly network bounded. So add another commit request
> > > wait queue. Only async writes need to sleep on this queue.
> > > 
> > 
> > I'm not understanding the above reasoning. Why should we serialise
> > commits at the per-filesystem level (and only for non-blocking flushes
> > at that)?
> 
> I did the commit wait queue after seeing this graph, where there is
> very bursty pattern of commit submission and hence completion:
> 
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2953M-2.6.37-rc3+-2010-12-03-01/nfs-commit-1000.png
> 
> leading to big fluctuations, eg. the almost straight up/straight down
> lines below
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2953M-2.6.37-rc3+-2010-12-03-01/vmstat-dirty-300.png
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2953M-2.6.37-rc3+-2010-12-03-01/dirty-pages.png
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2953M-2.6.37-rc3+-2010-12-03-01/dirty-pages-200.png
> 
> A commit wait queue will help wipe out the "peaks". The "fixed" graph
> is
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-03-23/vmstat-dirty-300.png
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/nfs-100dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-03-23/dirty-pages.png
> 
> Blocking flushes don't need to wait on this queue because they already
> throttle themselves by waiting on the inode commit lock before/after
> the commit.  They actually should not wait on this queue, to prevent
> sync requests being unnecessarily blocked by async ones.

OK, but isn't it better then to just abort the commit, and have the
relevant async process retry it later?

This is a code path which is followed by kswapd, for instance. It seems
dangerous to be throttling that instead of allowing it to proceed (and
perhaps being able to free up memory on some other partition in the mean
time).

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@...app.com
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/