[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200712061604.41490.phillips@phunq.net>
Date: Thu, 6 Dec 2007 16:04:41 -0800
From: Daniel Phillips <phillips@...nq.net>
To: Bill Davidsen <davidsen@....com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC] [PATCH] A clean approach to writeout throttling
On Thursday 06 December 2007 13:53, Bill Davidsen wrote:
> Daniel Phillips wrote:
> The problem is that you (a) may or may not know just how bad a worst
> case can be, and (b) may block unnecessarily by being pessimistic.
True, but after a quick introspect I realized that that issue (it's
really a single issue) is not any worse than the way I planned to wave
my hands at the issue of programmers constructing their metrics wrongly
and thereby breaking the throttling assumptions.
Which is to say that I am now entirely convince by Andrew's argument and
am prepardc to reroll the patch along the lines he suggests. The
result will be somewhat bigger. Only a minor change is required to the
main mechanism: we will now account things entirely in units of pages
instead of abstract units, eliminating a whole class of things to go
wrong. I like that. Accounting variables get shifted to a new home,
maybe. Must try a few ideas and see what works.
Anyway, the key idea is that task struct will gain a field pointing at a
handle for the "block device stack", whatever that is (this is sure to
evolve over time) and alloc_pages will know how to account pages to
that object. The submit_bio and bio->endio bits change hardly at all.
The runner up key idea is that we will gain a notion of "block device
stack" (or block stack for short, so that we may implement block
stackers) which for the time being will simply be Device Mapper's
notion of device stack, however many warts that may have. It's there
now and we use it for ddsnap.
The other player in this is Peterz's swap over network use case, which
does not involve a device mapper device. Maybe it should? Otherwise
we will need a variant notion of block device stack, and the two
threads of work should merge eventually. There is little harm in
starting this effort in two different places, quite the contrary.
In the meantime we do have a strategy that works, posted at the head of
this thread, for anybody who needs it now.
> The dummy transaction would be nice, but it would be perfect if you
> could send the real transaction down with a max memory limit and a
> flag, have each level check and decrement the max by what's actually
> needed, and then return some pass/fail status for that particular
> transaction. Clearly every level in the stack would have to know how
> to do that. It would seem that once excess memory use was detected
> the transaction could be failed without deadlock.
The function of the dummy transaction will be to establish roughly what
kind of footprint for a single transaction we see on that block IO
path. Then we will make the reservation _hugely_ greater than that, to
accommodate 1000 or so of those. A transaction blocks if it actually
tries to use more than that. We close the "many partial submissions
all deadlock together" hole by ... <insert handwaving here>. Can't go
wrong, right?
Agreed that drivers should pay special attention to our dummy
transaction and try to use the maximum possible resources when they see
one. More handwaving, but this is progress.
Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists