linux-kernel - [RFC][PATCH] mm: stop balance_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1245839904.3210.85.camel@localhost.localdomain>
Date:	Wed, 24 Jun 2009 11:38:24 +0100
From:	Richard Kennedy <richard@....demon.co.uk>
To:	Andrew Morton <akpm@...ux-foundation.org>,
	"a.p.zijlstra" <a.p.zijlstra@...llo.nl>
Cc:	linux-mm <linux-mm@...ck.org>, lkml <linux-kernel@...r.kernel.org>
Subject: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work

When writing to 2 (or more) devices at the same time, stop
balance_dirty_pages moving dirty pages to writeback when it has reached
the bdi threshold. This prevents balance_dirty_pages overshooting its
limits and moving all dirty pages to writeback.     

Signed-off-by: Richard Kennedy <richard@....demon.co.uk>
---
balance_dirty_pages can overreact and move all of the dirty pages to
writeback unnecessarily.

balance_dirty_pages makes its decision to throttle based on the number
of dirty plus writeback pages that are over the calculated limit,so it
will continue to move pages even when there are plenty of pages in
writeback and less than the threshold still dirty.

This allows it to overshoot its limits and move all the dirty pages to
writeback while waiting for the drives to catch up and empty the
writeback list. 

A simple fio test easily demonstrates this problem.  

fio --name=f1 --directory=/disk1 --size=2G -rw=write
	--name=f2 --directory=/disk2 --size=1G --rw=write 		--startdelay=10

The attached graph before.png shows how all pages are moved to writeback
as the second write starts and the throttling kicks in.

after.png is the same test with the patch applied, which clearly shows
that it keeps dirty_background_ratio dirty pages in the buffer.
The values and timings of the graphs are only approximate but are good
enough to show the behaviour.  

This is the simplest fix I could find, but I'm not entirely sure that it
alone will be enough for all cases. But it certainly is an improvement
on my desktop machine writing to 2 disks.

Do we need something more for machines with large arrays where
bdi_threshold * number_of_drives is greater than the dirty_ratio ?

regards
Richard

This patch against 2.6.30 git latest, built and tested on x86_64. 

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7b0dcea..7687879 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
 		 * filesystems (i.e. NFS) in which data may have been
 		 * written to the server's write cache, but has not yet
 		 * been flushed to permanent storage.
+		 * Only move pages to writeback if this bdi is over its
+		 * threshold otherwise wait until the disk writes catch
+		 * up.
 		 */
-		if (bdi_nr_reclaimable) {
+		if (bdi_nr_reclaimable > bdi_thresh) {
 			writeback_inodes(&wbc);
 			pages_written += write_chunk - wbc.nr_to_write;
 			get_dirty_limits(&background_thresh, &dirty_thresh,

Download attachment "before.png" of type "image/png" (2613 bytes)

Download attachment "after.png" of type "image/png" (2247 bytes)