[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1250863494.7538.49.camel@twins>
Date: Fri, 21 Aug 2009 16:04:54 +0200
From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
To: Richard Kennedy <richard@....demon.co.uk>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"chris.mason" <chris.mason@...cle.com>,
lkml <linux-kernel@...r.kernel.org>,
Jens Axboe <jens.axboe@...cle.com>, miklos <miklos@...redi.hu>
Subject: Re: [RFC PATCH] mm: balance_dirty_pages. reduce calls to
global_page_state to reduce cache references
(removed linux-mm because it seems to be ill atm)
On Fri, 2009-08-21 at 12:59 +0100, Richard Kennedy wrote:
> Reducing the number of times balance_dirty_pages calls global_page_state
> reduces the cache references and so improves write performance on a
> variety of workloads.
>
> 'perf stats' of simple fio write tests shows the reduction in cache
> access.
> Where the test is fio 'write,mmap,600Mb,pre_read' on AMD AthlonX2 with
> 3Gb memory (dirty_threshold approx 600 Mb)
> running each test 10 times, taking the average & standard deviation
>
> average (s.d.) in millions (10^6)
> 2.6.31-rc6 661 (9.88)
> +patch 604 (4.19)
Nice.
> Achieving this reduction is by dropping clip_bdi_dirty_limit as it
> rereads the counters to apply the dirty_threshold and moving this check
> up into balance_dirty_pages where it has already read the counters.
OK, so what you did is first check the total dirty limit, and only if
that is ok, check the per-BDI limit, now why didn't I think of that ;-)
> Also by rearrange the for loop to only contain one copy of the limit
> tests allows the pdflush test after the loop to use the local copies of
> the counters rather than rereading then.
>
> In the common case with no throttling it now calls global_page_state 5
> fewer times and bdi_stat 2 fewer.
>
> I have tried to retain the existing behavior as much as possible, but
> have added NR_WRITEBACK_TEMP to nr_writeback. This counter was used in
> clip_bdi_dirty_limit but not in balance_dirty_pages, grep suggests this
> is only used by FUSE but I haven't done any testing on that. It does
> seem logical to count all the WRITEBACK pages when making the throttling
> decisions so this change should be more correct ;)
Right, the NR_WRITEBACK_TEMP thing is a FUSE feature, its used in
writable mmap() support for FUSE things.
I must admit to forgetting the exact semantics of the things, maybe
Miklos can remind us.
> Signed-off-by: Richard Kennedy <richard@....demon.co.uk>
Looks good here
Acked-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> ----
> page-writeback.c | 116 ++++++++++++++++++++-----------------------------------
> 1 file changed, 43 insertions(+), 73 deletions(-)
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 81627eb..6f18e40 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -512,45 +485,12 @@ static void balance_dirty_pages(struct address_space *mapping)
> };
>
> get_dirty_limits(&background_thresh, &dirty_thresh,
> + &bdi_thresh, bdi);
>
> nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> + global_page_state(NR_UNSTABLE_NFS);
> + nr_writeback = global_page_state(NR_WRITEBACK) +
> + global_page_state(NR_WRITEBACK_TEMP);
>
> /*
> * In order to avoid the stacked BDI deadlock we need
> @@ -570,16 +510,48 @@ static void balance_dirty_pages(struct address_space *mapping)
> bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
> }
>
> + /* always throttle if over threshold */
> + if (nr_reclaimable + nr_writeback < dirty_thresh) {
> +
> + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> + break;
> +
> + /*
> + * Throttle it only when the background writeback cannot
> + * catch-up. This avoids (excessively) small writeouts
> + * when the bdi limits are ramping up.
> + */
> + if (nr_reclaimable + nr_writeback <
> + (background_thresh + dirty_thresh) / 2)
> + break;
> +
> + /* done enough? */
> + if (pages_written >= write_chunk)
> + break;
> + }
> + if (!bdi->dirty_exceeded)
> + bdi->dirty_exceeded = 1;
>
> + /* Note: nr_reclaimable denotes nr_dirty + nr_unstable.
> + * Unstable writes are a feature of certain networked
> + * filesystems (i.e. NFS) in which data may have been
> + * written to the server's write cache, but has not yet
> + * been flushed to permanent storage.
> + * Only move pages to writeback if this bdi is over its
> + * threshold otherwise wait until the disk writes catch
> + * up.
> + */
> + if (bdi_nr_reclaimable > bdi_thresh) {
> + writeback_inodes(&wbc);
> + pages_written += write_chunk - wbc.nr_to_write;
> + if (wbc.nr_to_write == 0)
> + continue;
> + }
> congestion_wait(BLK_RW_ASYNC, HZ/10);
> }
>
> if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> + bdi->dirty_exceeded)
> bdi->dirty_exceeded = 0;
>
> if (writeback_in_progress(bdi))
> @@ -593,10 +565,8 @@ static void balance_dirty_pages(struct address_space *mapping)
> * In normal mode, we start background writeout at the lower
> * background_thresh, to keep the amount of dirty memory low.
> */
> + if ((laptop_mode && pages_written) || (!laptop_mode &&
> + (nr_reclaimable > background_thresh)))
> pdflush_operation(background_writeout, 0);
> }
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists