linux-kernel - Re: [PATCH 01/45] writeback: reduce calls to global_page_state in balance_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20091011022813.GA21315@localhost>
Date:	Sun, 11 Oct 2009 10:28:13 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Theodore Tso <tytso@....edu>,
	Christoph Hellwig <hch@...radead.org>,
	Dave Chinner <david@...morbit.com>,
	Chris Mason <chris.mason@...cle.com>,
	"Li, Shaohua" <shaohua.li@...el.com>,
	Myklebust Trond <Trond.Myklebust@...app.com>,
	"jens.axboe@...cle.com" <jens.axboe@...cle.com>,
	Nick Piggin <npiggin@...e.de>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	Richard Kennedy <richard@....demon.co.uk>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 01/45] writeback: reduce calls to global_page_state in
	balance_dirty_pages()

On Fri, Oct 09, 2009 at 11:47:59PM +0800, Jan Kara wrote:
> On Fri 09-10-09 17:18:32, Peter Zijlstra wrote:
> > On Fri, 2009-10-09 at 17:12 +0200, Jan Kara wrote:
> > >   Ugh, but this is not equivalent! We would block the writer on some BDI
> > > without any dirty data if we are over global dirty limit. That didn't
> > > happen before.
> > 
> > It should have, we should throttle everything calling
> > balance_dirty_pages() when we're over the total limit.
>   OK :) I agree it's reasonable. But Wu, please note this in the
> changelog because it might be a substantial change for some loads.

Thanks, I added the note by Peter :)

Note that the total limit check itself may not be sufficient. For
example, there are no nr_writeback limit for NFS (and maybe btrfs)
after removing the congestion waits.  Therefore it is very possible

        nr_writeback => dirty_thresh
        nr_dirty     => 0

which is obviously undesirable: everything newly dirtied are soon put
to writeback. It violates the 30s expire time and the background
threshold rules, and will hurt write-and-truncate operations (ie. temp
files).

So the better solution would be to impose a nr_writeback limit for
every filesystem that didn't already have one (the block io queue).
NFS used to have that limit with congestion_wait, but now we need
to do a wait queue for it.

With the nr_writeback wait queue, it can be guaranteed that once
balance_dirty_pages() asks for writing 1500 pages, it will be done
with necessary sleeping in the bdi flush thread. So we can safely
remove the loop and double checking of global dirty limit in
balance_dirty_pages().

However, there is still one problem - there are no general
coordinations between max nr_writeback and the background/dirty
limits.

It is possible (and very likely for some small memory systems) that

	nr_writeback > dirty_thresh - background_thresh
	10,000         20,000         15,000

In this case, it is possible that an application to be throttled because
of
	nr_reclaimable + nr_writeback > dirty_thresh
	12,000           10,000         20,000

starts a background writeback work to do job for it, however that work
quits immediately because

	nr_reclaimable < background_thresh
	12,000           15,000

In the end, the application did not get throttled at all at dirty_thresh.
Instead, it will be throttled at (background_thresh + max_nr_writeback).

One solution (aka. the old behavior) is to respect the dirty_thresh, by
not quiting background writeback when there are throttled tasks (this
patch). It has the drawback of background writeback not doing its job
_actively_. Instead, it will frequently be started and quit at times
when applications enter and leave balanced_dirty_pages().

In the above scheme, the background_thresh is disregarded. The other
ways would be to disregard dirty_thresh (may be undesirable) or to
limit max_nr_writeback (not as easy).

It is still very possible to hit nr_dirty all the way down to 0 if
max_nr_writeback > background_thresh.

This is a bit twisting. Any ideas?

Signed-off-by: Wu Fengguang <fengguang.wu@...el.com>
---
 fs/fs-writeback.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- linux.orig/fs/fs-writeback.c	2009-10-11 09:19:49.000000000 +0800
+++ linux/fs/fs-writeback.c	2009-10-11 09:21:50.000000000 +0800
@@ -781,7 +781,8 @@ static long wb_writeback(struct bdi_writ
 		 * For background writeout, stop when we are below the
 		 * background dirty threshold
 		 */
-		if (args->for_background && !over_bground_thresh())
+		if (args->for_background && !over_bground_thresh() &&
+		    !list_empty(&wb->bdi->throttle_list))
 			break;

 		wbc.more_io = 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/