linux-kernel - Re: [PATCH 11/12] vmscan: Write out dirty pages in batch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20100614162143.04783749.akpm@linux-foundation.org>
Date:	Mon, 14 Jun 2010 16:21:43 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Dave Chinner <david@...morbit.com>
Cc:	Mel Gorman <mel@....ul.ie>, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
	Chris Mason <chris.mason@...cle.com>,
	Nick Piggin <npiggin@...e.de>, Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Christoph Hellwig <hch@...radead.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [PATCH 11/12] vmscan: Write out dirty pages in batch

On Tue, 15 Jun 2010 09:11:44 +1000
Dave Chinner <david@...morbit.com> wrote:

> On Mon, Jun 14, 2010 at 12:17:52PM +0100, Mel Gorman wrote:
> > Page reclaim cleans individual pages using a_ops->writepage() because from
> > the VM perspective, it is known that pages in a particular zone must be freed
> > soon, it considers the target page to be the oldest and it does not want
> > to wait while background flushers cleans other pages. From a filesystem
> > perspective this is extremely inefficient as it generates a very seeky
> > IO pattern leading to the perverse situation where it can take longer to
> > clean all dirty pages than it would have otherwise.
> > 
> > This patch queues all dirty pages at once to maximise the chances that
> > the write requests get merged efficiently. It also makes the next patch
> > that avoids writeout from direct reclaim more straight-forward.
> 
> Seeing as you have a list of pages for IO, perhaps they could be sorted
> before issuing ->writepage on them.
> 
> That is, while this patch issues all the IO in one hit, it doesn't
> change the order in which the IO is issued - it is still issued in
> LRU order. Given that they are issued in a short period of time now,
> rather than across a longer scan period, it is likely that it will
> not be any faster as:
> 
> 	a) IO will not be started as soon, and
> 	b) the IO scheduler still only has a small re-ordering
> 	   window and will choke just as much on random IO patterns.
> 
> However, there is a list_sort() function that could be used to sort
> the list; sorting the list of pages by mapping and page->index
> within the mapping would result in all the pages on each mapping
> being sent down in ascending offset order at once - exactly how the
> filesystems want IO to be sent to it.  Perhaps this is a simple
> improvement that can be made to this code that will make a big
> difference to worst case performance.
> 
> FWIW, I did this for delayed metadata buffer writeback in XFS
> recently (i.e. sort the queue of (potentially tens of thousands of)
> buffers in ascending block order before dispatch) and that showed a
> 10-15% reduction in seeks on simple kernel compile workloads. This
> shows that if we optimise IO patterns at higher layers where the
> sort window is much, much larger than in the IO scheduler, then
> overall system performance improves....

Yup.

But then, this all really should be done at the block layer so other
io-submitting-paths can benefit from it.

IOW, maybe "the sort queue is the submission queue" wasn't a good idea.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/