[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1197367861.6985.14.camel@twins>
Date: Tue, 11 Dec 2007 11:11:01 +0100
From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
To: zhejiang <zhe.jiang@...el.com>
Cc: linux-kernel@...r.kernel.org
Subject: Re: Reducing the bdi proporion calculation period to speed up disk
write
On Tue, 2007-12-11 at 14:25 +0800, zhejiang wrote:
> The patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f implemented bdi per
> device dirty threshold. It works well.
> However, the period for proportion calculation may be too large.
> For 8G memory, the calc_period_shift() will return 19 as the shift.
>
> When we switch writing operation between different disks, there may be
> potential performance issue.
>
> For example, we first write to disk A, then write to disk B.
> The proportion for disk B will increase slowly because the denominator
> is too large (It's 2^18 + (global_count & counter_mask)).
> The disk B will get small dirty page quota for a long time,
> it will get blocked frequently though the total dirty page is under the
> dirty page limit.
>
> Peter provided a patch to avoid this issue, this patch allow violation
> of bdi limits if there is a lot of room on the system.
> It looks like:
>
> +if (nr_reclaimable + nr_writeback < (background_thresh +
> dirty_thresh) / 2)
> + break;
>
> This patch really help to avoid congestion, but if the dirty pages
> exceed about 3/4 of the dirty_thresh, congestion still happens if we
> write to another disk.
>
> I think that we can reduce the period to speed up the proportion
> adjustment.
>
> diff -Nur a/page-writeback.c b/page-writeback.c
> --- a/page-writeback.c 2007-12-11 13:46:30.000000000 +0800
> +++ b/page-writeback.c 2007-12-11 13:47:11.000000000 +0800
> @@ -128,10 +128,7 @@
> */
> static int calc_period_shift(void)
> {
> - unsigned long dirty_total;
> -
> - dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> 100;
> - return 2 + ilog2(dirty_total - 1);
> + return 12;
> }
Its a heuristic, it might need some tuning, but a static value is wrong.
I think its generally true that the larger the machine memory size, the
faster the storage subsystem. And the more likely it has more disks.
One of the reasons this value isn't static is that with your fixed 12 it
becomes very hard to balance over more than 4096 active devices. Of
course, it takes quite a special set-up to get into that situation.
As it is, it now takes about 2 * dirty limit to switch over, you could
start by making that just a single, or maybe even half a, dirty limit.
Also, I'm not quite convinced your benchmark is all that useful. Do you
really think it matches an actual frequently occurring usage pattern?
Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)
Powered by blists - more mailing lists