linux-kernel - Re: [PATCH 01/45] writeback: reduce calls to global_page_state in balance_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 11 Oct 2009 18:50:12 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Jan Kara <jack@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>,
	Theodore Tso <tytso@....edu>,
	Christoph Hellwig <hch@...radead.org>,
	Dave Chinner <david@...morbit.com>,
	Chris Mason <chris.mason@...cle.com>,
	"Li, Shaohua" <shaohua.li@...el.com>,
	Myklebust Trond <Trond.Myklebust@...app.com>,
	"jens.axboe@...cle.com" <jens.axboe@...cle.com>,
	Nick Piggin <npiggin@...e.de>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	Richard Kennedy <richard@....demon.co.uk>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 01/45] writeback: reduce calls to global_page_state in
	balance_dirty_pages()

On Sun, Oct 11, 2009 at 03:44:40PM +0800, Peter Zijlstra wrote:
> On Sun, 2009-10-11 at 10:28 +0800, Wu Fengguang wrote:
> > 
> > Note that the total limit check itself may not be sufficient. For
> > example, there are no nr_writeback limit for NFS (and maybe btrfs)
> > after removing the congestion waits.  Therefore it is very possible
> > 
> >         nr_writeback => dirty_thresh
> >         nr_dirty     => 0
> > 
> > which is obviously undesirable: everything newly dirtied are soon put
> > to writeback. It violates the 30s expire time and the background
> > threshold rules, and will hurt write-and-truncate operations (ie. temp
> > files).
> > 
> > So the better solution would be to impose a nr_writeback limit for
> > every filesystem that didn't already have one (the block io queue).
> > NFS used to have that limit with congestion_wait, but now we need
> > to do a wait queue for it.
> > 
> > With the nr_writeback wait queue, it can be guaranteed that once
> > balance_dirty_pages() asks for writing 1500 pages, it will be done
> > with necessary sleeping in the bdi flush thread. So we can safely
> > remove the loop and double checking of global dirty limit in
> > balance_dirty_pages().
> 
> nr_reclaim = nr_dirty + nr_writeback + nr_unstable, so anything calling
> into balance_dirty_pages() would still block on seeing such large
> amounts of nr_writeback.

Our terms are a bit different. In my previous mail,
        nr_reclaim = nr_dirty + nr_unstable
nr_writeback is added separated when comparing with dirty_thresh, just
as the code in balance_dirty_pages().

But that's fine. You are right that the application will be blocked
and dirty limit be guaranteed, if we do
        while (over dirty limit) {
                bdi_writeback_wait(pages to write);
        }

But it has a problem: as long as the bdi-flush thread for NFS don't
limit nr_writeback, its nr_writeback will grow to near
(dirty_thresh-nr_unstable), and its nr_dirty will approach 0.
That's not desirable.

So I did this:
-       while (over dirty limit) {
+       if (over dirty limit) {
                bdi_writeback_wait(pages to write);
        }
_after_ adding the NFS nr_writeback wait queue ([PATCH 20/45] NFS:
introduce writeback wait queue). With that it's safe to remove the
loop.

> Having the constraint nr_dirty + nr_writeback + nr_unstable <
> dirty_thresh should ensure we never have nr_writeback > dirty_thresh,
> simply because you cannot dirty more, which then cannot be converted to
> more writeback.
> 
> Or am I missing something?

You are right with the assumption that the loop is still there.

Sorry for the confusion, but I mean, filesystems have to limit
nr_writeback (directly or indirectly via the block io queue),
otherwise it either hit nr_dirty to 0 (with the loop), or let
nr_writeback grow out of control (without the loop).

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/