lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 14 Dec 2010 23:26:33 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Richard Kennedy <richard@....demon.co.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.cz>, Christoph Hellwig <hch@....de>,
	Trond Myklebust <Trond.Myklebust@...app.com>,
	Dave Chinner <david@...morbit.com>,
	Theodore Ts'o <tytso@....edu>,
	Chris Mason <chris.mason@...cle.com>,
	Mel Gorman <mel@....ul.ie>, Rik van Riel <riel@...hat.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Greg Thelen <gthelen@...gle.com>,
	Minchan Kim <minchan.kim@...il.com>,
	linux-mm <linux-mm@...ck.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 04/35] writeback: reduce per-bdi dirty threshold ramp
 up time

On Tue, Dec 14, 2010 at 11:15:07PM +0800, Wu Fengguang wrote:
> On Tue, Dec 14, 2010 at 10:50:55PM +0800, Peter Zijlstra wrote:
> > On Tue, 2010-12-14 at 22:39 +0800, Wu Fengguang wrote:
> > > On Tue, Dec 14, 2010 at 10:33:25PM +0800, Wu Fengguang wrote:
> > > > On Tue, Dec 14, 2010 at 09:59:10PM +0800, Wu Fengguang wrote:
> > > > > On Tue, Dec 14, 2010 at 09:37:34PM +0800, Richard Kennedy wrote:
> > > > 
> > > > > > As to the ramp up time, when writing to 2 disks at the same time I see
> > > > > > the per_bdi_threshold taking up to 20 seconds to converge on a steady
> > > > > > value after one of the write stops. So I think this could be speeded up
> > > > > > even more, at least on my setup.
> > > > > 
> > > > > I have the roughly same ramp up time on the 1-disk 3GB mem test:
> > > > > 
> > > > > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/ext4-1dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-00-37/dirty-pages.png
> > > > >  
> > > > 
> > > > Interestingly, the above graph shows that after about 10s fast ramp
> > > > up, there is another 20s slow ramp down. It's obviously due the
> > > > decline of global limit:
> > > > 
> > > > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/ext4-1dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-00-37/vmstat-dirty.png
> > > > 
> > > > But why is the global limit declining?  The following log shows that
> > > > nr_file_pages keeps growing and goes stable after 75 seconds (so long
> > > > time!). In the same period nr_free_pages goes slowly down to its
> > > > stable value. Given that the global limit is mainly derived from
> > > > nr_free_pages+nr_file_pages (I disabled swap), something must be
> > > > slowly eating memory until 75 ms. Maybe the tracing ring buffers?
> > > > 
> > > >          free     file      reclaimable pages
> > > > 50s      369324 + 318760 => 688084
> > > > 60s      235989 + 448096 => 684085
> > > > 
> > > > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/ext4-1dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-00-37/vmstat
> > > 
> > > The log shows that ~64MB reclaimable memory is stoled. But the trace
> > > data only takes 1.8MB. Hmm..
> > 
> > Also, trace buffers are fully pre-allocated.
> > 
> > Inodes perhaps?
> 
> Just figured out that it's the buffer heads :)
> 
> The other interesting question is, why it takes up to 50s to consume
> all the nr_free_pages pages. I would imagine the free pages be quickly
> allocated to the page cache..
> 
> Attached is the graph for ext2-1dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-01-36

Ah it's embarrassing.. we are writing data and the free memory
consumption is quickly bounded by the disk write speed..

So it's FS independent.

Here is the graph for ext3 on vanilla kernel, generated from 

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/ext3-1dd-1M-8p-2952M-2.6.37-rc5-2010-12-10-19-57/vmstat

And btrfs on vanilla kernel

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/btrfs-1dd-1M-8p-2952M-2.6.37-rc5-2010-12-10-21-23/vmstat

Thanks,
Fengguang

Download attachment "vmstat-reclaimable-500.png" of type "image/png" (68089 bytes)

Download attachment "vmstat-dirty-500.png" of type "image/png" (57116 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ