linux-kernel - Re: regression in page writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20091002172620.GB8161@mit.edu>
Date:	Fri, 2 Oct 2009 13:26:20 -0400
From:	Theodore Tso <tytso@....edu>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Christoph Hellwig <hch@...radead.org>,
	Dave Chinner <david@...morbit.com>,
	Chris Mason <chris.mason@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	"Li, Shaohua" <shaohua.li@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"richard@....demon.co.uk" <richard@....demon.co.uk>,
	"jens.axboe@...cle.com" <jens.axboe@...cle.com>
Subject: Re: regression in page writeback

On Fri, Oct 02, 2009 at 04:19:53PM +0800, Wu Fengguang wrote:
> > > The big writes, if they are contiguous, could take 1-2 seconds
> > > on a very slow, ancient laptop disk, and that will hold up any kind of 
> > > small synchornous activities --- such as either a disk read or a firefox-
> > > triggered fsync().
> > 
> > Yes, that's a problem. The SYNC/ASYNC elevator queues can help here.

The SYNC/ASYNC queues will partially help, up to the whatever the
largest I/O that can issued as a single chunk times the queue depth
for those disks that support NCQ. 

> > There's still the problem of IO submission time != IO completion time,
> > due to fluctuations of randomness and more. However that's a general
> > and unavoidable problem.  Both the wbc.timeout scheme and the
> > "wbc.nr_to_write based on estimated throughput" scheme are based on
> > _past_ requests and it's simply impossible to have a 100% accurate
> > scheme. In principle, wbc.timeout will only be inferior at IO startup
> > time. In the steady state of 100% full queue, it is actually estimating
> > the IO throughput implicitly :)
> 
> Another difference between wbc.timeout and adaptive wbc.nr_to_write
> is, when there comes many _read_ requests or fsync, these SYNC rw
> requests will significant lower the ASYNC writeback throughput, if
> it's not completely stalled. So with timeout, the inode will be
> aborted with few pages written; with nr_to_write, the inode will be
> written a good number of pages, at the cost of taking up long time.
> 
> IMHO the nr_to_write behavior seems more efficient. What do you think?

I agree, adaptively changing nr_to_write seems like the right thing to
do.  For bonus points, we could also monitor how often synchronous I/O
operations are happening, allow nr_to_write to go up by some amount if
there aren't many synchronous operations happening at the moment.  So
that might be another opportunity to do auto-tuning, although this
might be a hueristic that might need to be configurable for certain
specialized workloads.  For many other workloads, the it should be
possible to detect regular pattern of reads and/or synchronous writes,
and if so, use a lower nr_to_write versus if there isn't many
synchronous I/O operations happening on that particular block device.

	    		   	     	     - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/