linux-kernel - Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110418145929.GH5557@quack.suse.cz>
Date:	Mon, 18 Apr 2011 16:59:29 +0200
From:	Jan Kara <jack@...e.cz>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Jan Kara <jack@...e.cz>, Wu Fengguang <fengguang.wu@...el.com>,
	Dave Chinner <david@...morbit.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Richard Kennedy <richard@....demon.co.uk>,
	Hugh Dickins <hughd@...gle.com>,
	Rik van Riel <riel@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Linux Memory Management List <linux-mm@...ck.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up
 time

On Sat 16-04-11 10:33:29, Peter Zijlstra wrote:
> On Sat, 2011-04-16 at 00:13 +0200, Jan Kara wrote:
> > 
> > So what is a takeaway from this for me is that scaling the period
> > with the dirty limit is not the right thing. If you'd have 4-times more
> > memory, your choice of "dirty limit" as the period would be as bad as
> > current 4*"dirty limit". What would seem like a better choice of period
> > to me would be to have the period in an order of a few seconds worth of
> > writeback. That would allow the bdi limit to scale up reasonably fast when
> > new bdi starts to be used and still not make it fluctuate that much
> > (hopefully).
> 
> No best would be to scale the period with the writeout bandwidth, but
> lacking that the dirty limit had to do. Since we're counting pages, and
> bandwidth is pages/second we'll end up with a time measure, exactly the
> thing you wanted.
  Yes, I was thinking about this as well. We could measure the throughput
but essentially it's a changing entity (dependent on the type of load and
possibly other things like network load for NFS, or other machines
accessing your NAS). So I'm not sure one constant value will work (esp.
because you have to measure it and you never know at which state you did
the measurement). And when you have changing values, you have to solve the
same problem as with time based periods - that's how I came to them.

> > Looking at math in lib/proportions.c, nothing really fundamental requires
> > that each period has the same length. So it shouldn't be hard to actually
> > create proportions calculator that would have timer triggered periods -
> > simply whenever the timer fires, we would declare a new period. The only
> > things which would be broken by this are (t represents global counter of
> > events):
> > a) counting of periods as t/period_len - we would have to maintain global
> > period counter but that's trivial
> > b) trick that we don't do t=t/2 for each new period but rather use
> > period_len/2+(t % (period_len/2)) when calculating fractions - again we
> > would have to bite the bullet and divide the global counter when we declare
> > new period but again it's not a big deal in our case.
> > 
> > Peter what do you think about this? Do you (or anyone else) think it makes
> > sense? 
> 
> But if you don't have a fixed sized period, then how do you catch up on
> fractions that haven't been updated for several periods? You cannot go
> remember all the individual period lengths.
  OK, I wrote the expressions down and the way I want to do it would get
different fractions than your original formula:

  Your formula is:
p(j)=\sum_i x_i(j)/(t_i*2^{i+1})
  where $i$ sums from 0 to \infty, x_i(j) is the number of events of type
$j$ in period $i$, $t_i$ is the total number of events in period $i$.

  I want to compute
l(j)=\sum_i x_i(j)/2^{i+1}
g=\sum_i t_i/2^{i+1}
  and
p(j)=l(j)/g

  Clearly, all these values can be computed in O(1). Now for t_i = t for every
i, the results of both formulas are the same (which is what made me make my
mistake). But when t_i differ, the results are different. I'd say that the
new formula also provides a meaningful notion of writeback share although
it's hard to quantify how far the computations will be in practice...
  
> The whole trick to the proportion stuff is that its all O(1) regardless
> of the number of contestants. There isn't a single loop that iterates
> over all BDIs or tasks to update their cycle, that wouldn't have scaled.
  Sure, I understand.

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/