lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1197614726.6362.137.camel@ymzhang>
Date:	Fri, 14 Dec 2007 14:45:26 +0800
From:	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	zhejiang <zhe.jiang@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: Reducing the bdi proporion calculation period to speed up disk
	write

On Tue, 2007-12-11 at 11:11 +0100, Peter Zijlstra wrote:
> On Tue, 2007-12-11 at 14:25 +0800, zhejiang wrote:
> > The patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f implemented bdi per
> > device dirty threshold. It works well.
> > However, the period for proportion calculation may be too large.
> > For 8G memory, the calc_period_shift() will return 19 as the shift.
> > 
> > When we switch writing operation between different disks, there may be
> > potential performance issue.
> > 
> > For example, we first write to disk A, then write to disk B.
> > The proportion for disk B will increase slowly because the denominator
> > is too large (It's 2^18 + (global_count & counter_mask)).
> > The disk B will get small dirty page quota for a long time,
> > it will get blocked frequently though the total dirty page is under the
> > dirty page limit.
> > 
> > Peter provided a patch to avoid this issue, this patch allow violation
> > of bdi limits if there is a lot of room on the system.
> > It looks like:
> > 
> > +if (nr_reclaimable + nr_writeback < (background_thresh +
> > dirty_thresh) / 2)
> > +                     break; 
> > 
> > This patch really help to avoid congestion, but if the dirty pages
> > exceed about 3/4 of the dirty_thresh, congestion still happens if we
> > write to another disk. 
> > 
> > I think that we can reduce the period to speed up the proportion
> > adjustment. 
> > 
> > diff -Nur a/page-writeback.c b/page-writeback.c
> > --- a/page-writeback.c  2007-12-11 13:46:30.000000000 +0800
> > +++ b/page-writeback.c  2007-12-11 13:47:11.000000000 +0800
> > @@ -128,10 +128,7 @@
> >   */
> >  static int calc_period_shift(void)
> >  {
> > -       unsigned long dirty_total;
> > -
> > -       dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> > 100;
> > -       return 2 + ilog2(dirty_total - 1);
> > +       return 12;
> >  }
> 
> Its a heuristic, it might need some tuning, but a static value is wrong.
> I think its generally true that the larger the machine memory size, the
> faster the storage subsystem. And the more likely it has more disks.
> 
> One of the reasons this value isn't static is that with your fixed 12 it
> becomes very hard to balance over more than 4096 active devices. Of
> course, it takes quite a special set-up to get into that situation.
I strongly agree with you that a static value is not a good idea.

> 
> As it is, it now takes about 2 * dirty limit to switch over, you could
> start by making that just a single, or maybe even half a, dirty limit.
We will do more testing to choose a better formular based on dirty_ratio
and total memory.

> 
> 
> Also, I'm not quite convinced your benchmark is all that useful. Do you
> really think it matches an actual frequently occurring usage pattern?
We used iozone to test 1.2GB sequential write/rewrite. It's hard to match
exactly an actual usage pattern, but I have an example. Administrator
might backup big files to other free disks periodically although he/she might
not need it fast.

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ