lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1197367861.6985.14.camel@twins>
Date:	Tue, 11 Dec 2007 11:11:01 +0100
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	zhejiang <zhe.jiang@...el.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Reducing the bdi proporion calculation period to speed up disk
	write


On Tue, 2007-12-11 at 14:25 +0800, zhejiang wrote:
> The patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f implemented bdi per
> device dirty threshold. It works well.
> However, the period for proportion calculation may be too large.
> For 8G memory, the calc_period_shift() will return 19 as the shift.
> 
> When we switch writing operation between different disks, there may be
> potential performance issue.
> 
> For example, we first write to disk A, then write to disk B.
> The proportion for disk B will increase slowly because the denominator
> is too large (It's 2^18 + (global_count & counter_mask)).
> The disk B will get small dirty page quota for a long time,
> it will get blocked frequently though the total dirty page is under the
> dirty page limit.
> 
> Peter provided a patch to avoid this issue, this patch allow violation
> of bdi limits if there is a lot of room on the system.
> It looks like:
> 
> +if (nr_reclaimable + nr_writeback < (background_thresh +
> dirty_thresh) / 2)
> +                     break; 
> 
> This patch really help to avoid congestion, but if the dirty pages
> exceed about 3/4 of the dirty_thresh, congestion still happens if we
> write to another disk. 
> 
> I think that we can reduce the period to speed up the proportion
> adjustment. 
> 
> diff -Nur a/page-writeback.c b/page-writeback.c
> --- a/page-writeback.c  2007-12-11 13:46:30.000000000 +0800
> +++ b/page-writeback.c  2007-12-11 13:47:11.000000000 +0800
> @@ -128,10 +128,7 @@
>   */
>  static int calc_period_shift(void)
>  {
> -       unsigned long dirty_total;
> -
> -       dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> 100;
> -       return 2 + ilog2(dirty_total - 1);
> +       return 12;
>  }

Its a heuristic, it might need some tuning, but a static value is wrong.
I think its generally true that the larger the machine memory size, the
faster the storage subsystem. And the more likely it has more disks.

One of the reasons this value isn't static is that with your fixed 12 it
becomes very hard to balance over more than 4096 active devices. Of
course, it takes quite a special set-up to get into that situation.

As it is, it now takes about 2 * dirty limit to switch over, you could
start by making that just a single, or maybe even half a, dirty limit.


Also, I'm not quite convinced your benchmark is all that useful. Do you
really think it matches an actual frequently occurring usage pattern?


Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ