linux-kernel - Re: [PATCH 04/35] writeback: reduce per-bdi dirty threshold ramp up time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101214151507.GA25896@localhost>
Date:	Tue, 14 Dec 2010 23:15:07 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Richard Kennedy <richard@....demon.co.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.cz>, Christoph Hellwig <hch@....de>,
	Trond Myklebust <Trond.Myklebust@...app.com>,
	Dave Chinner <david@...morbit.com>,
	Theodore Ts'o <tytso@....edu>,
	Chris Mason <chris.mason@...cle.com>,
	Mel Gorman <mel@....ul.ie>, Rik van Riel <riel@...hat.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Greg Thelen <gthelen@...gle.com>,
	Minchan Kim <minchan.kim@...il.com>,
	linux-mm <linux-mm@...ck.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 04/35] writeback: reduce per-bdi dirty threshold ramp
 up time

On Tue, Dec 14, 2010 at 10:50:55PM +0800, Peter Zijlstra wrote:
> On Tue, 2010-12-14 at 22:39 +0800, Wu Fengguang wrote:
> > On Tue, Dec 14, 2010 at 10:33:25PM +0800, Wu Fengguang wrote:
> > > On Tue, Dec 14, 2010 at 09:59:10PM +0800, Wu Fengguang wrote:
> > > > On Tue, Dec 14, 2010 at 09:37:34PM +0800, Richard Kennedy wrote:
> > > 
> > > > > As to the ramp up time, when writing to 2 disks at the same time I see
> > > > > the per_bdi_threshold taking up to 20 seconds to converge on a steady
> > > > > value after one of the write stops. So I think this could be speeded up
> > > > > even more, at least on my setup.
> > > > 
> > > > I have the roughly same ramp up time on the 1-disk 3GB mem test:
> > > > 
> > > > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/ext4-1dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-00-37/dirty-pages.png
> > > >  
> > > 
> > > Interestingly, the above graph shows that after about 10s fast ramp
> > > up, there is another 20s slow ramp down. It's obviously due the
> > > decline of global limit:
> > > 
> > > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/ext4-1dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-00-37/vmstat-dirty.png
> > > 
> > > But why is the global limit declining?  The following log shows that
> > > nr_file_pages keeps growing and goes stable after 75 seconds (so long
> > > time!). In the same period nr_free_pages goes slowly down to its
> > > stable value. Given that the global limit is mainly derived from
> > > nr_free_pages+nr_file_pages (I disabled swap), something must be
> > > slowly eating memory until 75 ms. Maybe the tracing ring buffers?
> > > 
> > >          free     file      reclaimable pages
> > > 50s      369324 + 318760 => 688084
> > > 60s      235989 + 448096 => 684085
> > > 
> > > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/ext4-1dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-00-37/vmstat
> > 
> > The log shows that ~64MB reclaimable memory is stoled. But the trace
> > data only takes 1.8MB. Hmm..
> 
> Also, trace buffers are fully pre-allocated.
> 
> Inodes perhaps?

Just figured out that it's the buffer heads :)

The other interesting question is, why it takes up to 50s to consume
all the nr_free_pages pages. I would imagine the free pages be quickly
allocated to the page cache..

Attached is the graph for ext2-1dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-01-36

Thanks,
Fengguang

Download attachment "vmstat-reclaimable-500.png" of type "image/png" (66540 bytes)