[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070620015826.03f1d71a.akpm@linux-foundation.org>
Date: Wed, 20 Jun 2007 01:58:26 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: davej@...hat.com, tim.c.chen@...ux.intel.com,
linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org
Subject: Re: Change in default vm_dirty_ratio
> On Wed, 20 Jun 2007 10:35:36 +0200 Peter Zijlstra <peterz@...radead.org> wrote:
> On Tue, 2007-06-19 at 21:44 -0700, Andrew Morton wrote:
>
> > Anyway, this is all arse-about. What is the design? What algorithms
> > do we need to implement to do this successfully? Answer me that, then
> > we can decide upon these implementation details.
>
> Building on the per BDI patches, how about integrating feedback from the
> full-ness of device queues. That is, when we are happily doing IO and we
> cannot possibly saturate the active devices (as measured by their queue
> never reaching 75%?) then we can safely increase the total dirty limit.
>
> OTOH, when even with the per BDI dirty limit the device queue is
> constantly saturated (contended) we ought to lower the total dirty
> limit.
>
> Lots of detail here to work out, but does this sound workable?
It's pretty easy to fill the queues - I'd expect that there are a lot of
not-very-heavy workloads which cause the kernel to shove a lot of little
writes into the queue when it visits the blockdev mapping: a shower of
inodes, directory entries, indirect blocks, etc. With very little dirty
memory associated with it.
But back away further.
What do we actually want the kernel to *do*? Stated in terms of "when the
dirty memory state is A, do B" and "when userspace does C, the kernel should
do D".
Top-level statement: "when userspace does anything, the kernel should not
suck" ;) Some refinement is needed there.
I _think_ the problem is basically one of latency: a) writes starving reads
and b) dirty memory causing page reclaim to stall and c) inter-device
contention on the global memory limits.
Hard. If the device isn't doing anything else then we can shove data at it
freely. If reads (or synchronous writes) come in then perhaps the VM
should back off and permit dirty memory to go higher.
The anticipatory scheduler(s) are supposed to fix this.
Perhaps our queues are too long - if the VFS _does_ back off, it'll take
some time for that to have an effect.
Perhaps the fact that the queue size knows nothing about the _size_ of the
requests in the queue is a problem.
Back away even further here.
What user-visible problem(s) are we attemping to fix?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists