[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091010213339.GA8644@localhost>
Date: Sun, 11 Oct 2009 05:33:39 +0800
From: Wu Fengguang <fengguang.wu@...el.com>
To: Jan Kara <jack@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Theodore Tso <tytso@....edu>,
Christoph Hellwig <hch@...radead.org>,
Dave Chinner <david@...morbit.com>,
Chris Mason <chris.mason@...cle.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
"Li, Shaohua" <shaohua.li@...el.com>,
Myklebust Trond <Trond.Myklebust@...app.com>,
"jens.axboe@...cle.com" <jens.axboe@...cle.com>,
Nick Piggin <npiggin@...e.de>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
Richard Kennedy <richard@....demon.co.uk>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 01/45] writeback: reduce calls to global_page_state in
balance_dirty_pages()
On Fri, Oct 09, 2009 at 11:12:31PM +0800, Jan Kara wrote:
> Hi,
>
> On Wed 07-10-09 15:38:19, Wu Fengguang wrote:
> > From: Richard Kennedy <richard@....demon.co.uk>
> >
> > Reducing the number of times balance_dirty_pages calls global_page_state
> > reduces the cache references and so improves write performance on a
> > variety of workloads.
> >
> > 'perf stats' of simple fio write tests shows the reduction in cache
> > access.
> > Where the test is fio 'write,mmap,600Mb,pre_read' on AMD AthlonX2 with
> > 3Gb memory (dirty_threshold approx 600 Mb)
> > running each test 10 times, dropping the fasted & slowest values then
> > taking
> > the average & standard deviation
> >
> > average (s.d.) in millions (10^6)
> > 2.6.31-rc8 648.6 (14.6)
> > +patch 620.1 (16.5)
> >
> > Achieving this reduction is by dropping clip_bdi_dirty_limit as it
> > rereads the counters to apply the dirty_threshold and moving this check
> > up into balance_dirty_pages where it has already read the counters.
> >
> > Also by rearrange the for loop to only contain one copy of the limit
> > tests allows the pdflush test after the loop to use the local copies of
> > the counters rather than rereading them.
> >
> > In the common case with no throttling it now calls global_page_state 5
> > fewer times and bdi_stat 2 fewer.
> Hmm, but the patch changes the behavior of balance_dirty_pages() in
> several ways:
Yes, unfortunately the changelog failed to make that clear ..
> > -/*
> > - * Clip the earned share of dirty pages to that which is actually available.
> > - * This avoids exceeding the total dirty_limit when the floating averages
> > - * fluctuate too quickly.
> > - */
> > -static void clip_bdi_dirty_limit(struct backing_dev_info *bdi,
> > - unsigned long dirty, unsigned long *pbdi_dirty)
> > -{
> > - unsigned long avail_dirty;
> > -
> > - avail_dirty = global_page_state(NR_FILE_DIRTY) +
> > - global_page_state(NR_WRITEBACK) +
> > - global_page_state(NR_UNSTABLE_NFS) +
> > - global_page_state(NR_WRITEBACK_TEMP);
> > -
> > - if (avail_dirty < dirty)
> > - avail_dirty = dirty - avail_dirty;
> > - else
> > - avail_dirty = 0;
> > -
> > - avail_dirty += bdi_stat(bdi, BDI_RECLAIMABLE) +
> > - bdi_stat(bdi, BDI_WRITEBACK);
> > -
> > - *pbdi_dirty = min(*pbdi_dirty, avail_dirty);
> > -}
> > -
> > static inline void task_dirties_fraction(struct task_struct *tsk,
> > long *numerator, long *denominator)
> > {
> > @@ -468,7 +442,6 @@ get_dirty_limits(unsigned long *pbackgro
> > bdi_dirty = dirty * bdi->max_ratio / 100;
> >
> > *pbdi_dirty = bdi_dirty;
> > - clip_bdi_dirty_limit(bdi, dirty, pbdi_dirty);
> I don't see, what test in balance_dirty_limits() should replace this
> clipping... OTOH clipping does not seem to have too much effect on the
> behavior of balance_dirty_pages - the limit we clip to (at least
> BDI_WRITEBACK + BDI_RECLAIMABLE) is large enough so that we break from the
> loop immediately. So just getting rid of the function is fine but
> I'd update the changelog accordingly.
>
It essentially replace clip_bdi_dirty_limit() with the explicit check
(nr_reclaimable + nr_writeback >= dirty_thresh) to avoid exceeding the
dirty limit. Since the bdi dirty limit is mostly accurate we don't need
to do routinely clip. A simple dirty limit check would be enough.
I added the above text to changelog :)
> > + dirty_exceeded =
> > + (bdi_nr_reclaimable + bdi_nr_writeback >= bdi_thresh)
> > + || (nr_reclaimable + nr_writeback >= dirty_thresh);
> >
> > - if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> > + if (!dirty_exceeded)
> > break;
> Ugh, but this is not equivalent! We would block the writer on some BDI
> without any dirty data if we are over global dirty limit. That didn't
> happen before.
This restores the (right) behavior in 2.6.18. And peter have the same goal
in mind with clip_bdi_dirty_limit() ;)
> > + /* don't wait if we've done enough */
> > + if (pages_written >= write_chunk)
> > + break;
> > }
> > -
> > - if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> > - break;
> > - if (pages_written >= write_chunk)
> > - break; /* We've done our duty */
> > -
> Here, we had an opportunity to break from the loop even if we didn't
> manage to write everything (for example because per-bdi thread managed to
> write enough or because enough IO has completed while we were trying to
> write). After the patch, we will sleep. IMHO that's not good...
Note that the pages_written check is moved several lines up in the patch :)
> I'd think that if we did all that work in writeback_inodes_wbc we could
> spend the effort on regetting and rechecking the stats...
Yes maybe. I didn't care it because the later throttle queue patch totally
removed the loop and hence to need to reget the stats :)
> > schedule_timeout_interruptible(pause);
> >
> > /*
> > @@ -577,8 +547,7 @@ static void balance_dirty_pages(struct a
> > pause = HZ / 10;
> > }
> >
> > - if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> > - bdi->dirty_exceeded)
> > + if (!dirty_exceeded && bdi->dirty_exceeded)
> > bdi->dirty_exceeded = 0;
> Here we fail to clear dirty_exceeded if we are over global dirty limit
> but not over per-bdi dirty limit...
You must be mistaken: dirty_exceeded = (over bdi limit || over global limit),
so !dirty_exceeded = (!over bdi limit && !over global limit).
> > @@ -593,9 +562,7 @@ static void balance_dirty_pages(struct a
> > * background_thresh, to keep the amount of dirty memory low.
> > */
> > if ((laptop_mode && pages_written) ||
> > - (!laptop_mode && ((global_page_state(NR_FILE_DIRTY)
> > - + global_page_state(NR_UNSTABLE_NFS))
> > - > background_thresh)))
> > + (!laptop_mode && (nr_reclaimable > background_thresh)))
> > bdi_start_writeback(bdi, NULL, 0);
> > }
> This might be based on rather old values in case we break from the loop
> after calling writeback_inodes_wbc.
Yes that's possible. It's safe because the bdi worker will double check
background_thresh. We can call bdi_start_writeback() as long as there are
good possibility: the nr_reclaimable is not likely to drop suddenly from
during our writeout.
Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists