linux-kernel - Re: RFC - how to balance Dirty+Writeback in the face of slow writeback.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 25 Aug 2006 15:24:47 +1000
From:	Neil Brown <neilb@...e.de>
To:	David Chinner <dgc@....com>
Cc:	Andi Kleen <ak@...e.de>, Jens Axboe <axboe@...e.de>,
	linux-kernel@...r.kernel.org, akpm@...l.org
Subject: Re: RFC - how to balance Dirty+Writeback in the face of slow  writeback.

On Tuesday August 22, dgc@....com wrote:
> On Mon, Aug 21, 2006 at 05:24:14PM +1000, Neil Brown wrote:
> > So my feeling (at the moment) is that balance_dirty_pages should look
> > like:
> > 
> >  if below threshold 
> >        return
> >  writeback_inodes({.bdi = mapping->backing_dev_info)} )
> > 
> >  while (above threshold + 10%)
> >         writeback_inodes(.bdi = NULL)
> >         blk_congestion_wait
> > 
> > and all bdis should impose a queue limit.
> 
> I don't really like the "+ 10%" in there - it's too rubbery given
> the range of memory sizes Linux supports (think of an Altix with
> several TBs of RAM in it ;). With bdis imposing a queue limit, the
> number of writeback pages should be bound  and so we shouldn't need
> headroom like this.

I had that there precisely because some BDIs are not bounded - nfs in
particular (which is what started this whole thread).
I think I'm now convinced that nfs really need to limit its writeout
queue. 

> 
> Hmmm - the above could put the writer to sleep on the request queue
> of the slow device that holds all dirty+writeback. This could
> effectively slow all writers down to the rate of the slowest device
> in the system as they all attempt to do blocking writeback on the
> only dirty bdi (the really slow one).


> 
> AFAICT, all we need to do is prevent interactions between bdis and
> the current problem is that we loop on clean bdis waiting for slow
> dirty ones to drain.
> 
> My thoughts are along the lines of a decay in nr_to_write between
> loop iterations when we don't write out enough pages (i.e. clean
> bdi) so we break out of the loop sooner rather than later.

I don't understand the purpose of the decay.  Once you are sure the
bdi is clean, why not break out of the loop straight away?

Also, your code is a little confusing.  The 
  pages_written += write_chunk - wbc.nr_to_write
in the loop assumes that wbc.nr_to_write equalled write_chunk just
before the call to writeback_inodes, however as you have moved the
initialisation of wbc out of the loop, this is no longer true.

So I would like us to break out of the loop as soon as there is good
reason to believe the bdi is clean.

So maybe something like this..
Note that we *must* have bounded queue on all bdis or this patch can
cause substantial badness.

NeilBrown

Signed-off-by: Neil Brown <neilb@...e.de>

### Diffstat output
 ./mm/page-writeback.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff .prev/mm/page-writeback.c ./mm/page-writeback.c
--- .prev/mm/page-writeback.c	2006-08-25 15:18:37.000000000 +1000
+++ ./mm/page-writeback.c	2006-08-25 15:22:39.000000000 +1000
@@ -187,7 +187,7 @@ static void balance_dirty_pages(struct a
 			.bdi		= bdi,
 			.sync_mode	= WB_SYNC_NONE,
 			.older_than_this = NULL,
-			.nr_to_write	= write_chunk,
+			.nr_to_write	= write_chunk - pages_written,
 			.range_cyclic	= 1,
 		};
 
@@ -217,10 +217,13 @@ static void balance_dirty_pages(struct a
 				global_page_state(NR_WRITEBACK)
 					<= dirty_thresh)
 						break;
-			pages_written += write_chunk - wbc.nr_to_write;
+			if (pages_written == write_chunk - wbc.nr_to_write)
+				break;		/* couldn't write - must be clean */
+			pages_written = write_chunk - wbc.nr_to_write;
 			if (pages_written >= write_chunk)
 				break;		/* We've done our duty */
-		}
+		} else
+			break;
 		blk_congestion_wait(WRITE, HZ/10);
 	}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/