linux-kernel - Re: [PATCH 01/45] writeback: reduce calls to global_page_state in balance_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 13 Oct 2009 10:41:17 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Jan Kara <jack@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>,
	Theodore Tso <tytso@....edu>,
	Christoph Hellwig <hch@...radead.org>,
	Dave Chinner <david@...morbit.com>,
	Chris Mason <chris.mason@...cle.com>,
	"Li, Shaohua" <shaohua.li@...el.com>,
	Myklebust Trond <Trond.Myklebust@...app.com>,
	"jens.axboe@...cle.com" <jens.axboe@...cle.com>,
	Nick Piggin <npiggin@...e.de>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	Richard Kennedy <richard@....demon.co.uk>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 01/45] writeback: reduce calls to global_page_state in
 balance_dirty_pages()

On Tue, 2009-10-13 at 11:24 +0800, Wu Fengguang wrote:

> You are right too :) I followed you and Peter's advice to do the loop
> and the recheck of stats as follows:

> 	This patch slightly changes behavior by replacing clip_bdi_dirty_limit()
>         with the explicit check (nr_reclaimable + nr_writeback >= dirty_thresh)
>         to avoid exceeding the dirty limit. Since the bdi dirty limit is mostly
>         accurate we don't need to do routinely clip. A simple dirty limit check
>         would be enough.
> 
>         The check is necessary because, in principle we should throttle
>         everything calling balance_dirty_pages() when we're over the total
>         limit, as said by Peter.
> 
>         We now set and clear dirty_exceeded not only based on bdi dirty limits,
>         but also on the global dirty limits. This is a bit counterintuitive, but
>         the global limits are the ultimate goal and shall be always imposed.
> 
>         We may now start background writeback work based on outdated conditions.
>         That's safe because the bdi flush thread will (and have to) double check
>         the states. It reduces overall overheads because the test based on old
>         states still have good chance to be right.

> static void balance_dirty_pages(struct address_space *mapping,
> 				unsigned long write_chunk)
> {
> 	long nr_reclaimable, bdi_nr_reclaimable;
> 	long nr_writeback, bdi_nr_writeback;
> 	unsigned long background_thresh;
> 	unsigned long dirty_thresh;
> 	unsigned long bdi_thresh;
> 	int dirty_exceeded;
> 	struct backing_dev_info *bdi = mapping->backing_dev_info;
> 
> 	/*
> 	 * If sync() is in progress, curb the to-be-synced inodes regardless
> 	 * of dirty limits, so that a fast dirtier won't livelock the sync.
> 	 */
> 	if (unlikely(bdi->sync_time &&
> 		     S_ISREG(mapping->host->i_mode) &&
> 		     time_after_eq(bdi->sync_time,
> 				   mapping->host->dirtied_when))) {
> 		write_chunk *= 2;
> 		bdi_writeback_wait(bdi, write_chunk);
> 	}
> 
> 	for (;;) {
> 		nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> 				 global_page_state(NR_UNSTABLE_NFS);
> 		nr_writeback = global_page_state(NR_WRITEBACK) +
> 			       global_page_state(NR_WRITEBACK_TEMP);
> 
> 		global_dirty_thresh(&background_thresh, &dirty_thresh);
> 
> 		/*
> 		 * Throttle it only when the background writeback cannot
> 		 * catch-up. This avoids (excessively) small writeouts
> 		 * when the bdi limits are ramping up.
> 		 */
> 		if (nr_reclaimable + nr_writeback <
> 		    (background_thresh + dirty_thresh) / 2)
> 			break;
> 
> 		bdi_thresh = bdi_dirty_thresh(bdi, dirty_thresh);
> 
> 		/*
> 		 * In order to avoid the stacked BDI deadlock we need
> 		 * to ensure we accurately count the 'dirty' pages when
> 		 * the threshold is low.
> 		 *
> 		 * Otherwise it would be possible to get thresh+n pages
> 		 * reported dirty, even though there are thresh-m pages
> 		 * actually dirty; with m+n sitting in the percpu
> 		 * deltas.
> 		 */
> 		if (bdi_thresh < 2*bdi_stat_error(bdi)) {
> 			bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> 			bdi_nr_writeback = bdi_stat_sum(bdi, BDI_WRITEBACK);
> 		} else {
> 			bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> 			bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
> 		}
> 
> 		/*
> 		 * The bdi thresh is somehow "soft" limit derived from the
> 		 * global "hard" limit. The former helps to prevent heavy IO
> 		 * bdi or process from holding back light ones; The latter is
> 		 * the last resort safeguard.
> 		 */
> 		dirty_exceeded =
> 			(bdi_nr_reclaimable + bdi_nr_writeback >= bdi_thresh)
> 			|| (nr_reclaimable + nr_writeback >= dirty_thresh);
> 
> 		if (!dirty_exceeded)
> 			break;
> 
> 		bdi->dirty_exceed_time = jiffies;
> 
> 		bdi_writeback_wait(bdi, write_chunk);
> 	}
> 
> 	/*
> 	 * In laptop mode, we wait until hitting the higher threshold before
> 	 * starting background writeout, and then write out all the way down
> 	 * to the lower threshold.  So slow writers cause minimal disk activity.
> 	 *
> 	 * In normal mode, we start background writeout at the lower
> 	 * background_thresh, to keep the amount of dirty memory low.
> 	 */
> 	if (!laptop_mode && (nr_reclaimable > background_thresh) &&
> 	    can_submit_background_writeback(bdi))
> 		bdi_start_writeback(bdi, NULL, WB_FOR_BACKGROUND);
> }

Looks good, Thanks Wu!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/