linux-kernel - Re: Change in default vm_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1182331167.21117.38.camel@twins>
Date:	Wed, 20 Jun 2007 11:19:27 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	davej@...hat.com, tim.c.chen@...ux.intel.com,
	linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
	David Miller <davem@...emloft.net>
Subject: Re: Change in default vm_dirty_ratio

On Wed, 2007-06-20 at 01:58 -0700, Andrew Morton wrote:
> > On Wed, 20 Jun 2007 10:35:36 +0200 Peter Zijlstra <peterz@...radead.org> wrote:
> > On Tue, 2007-06-19 at 21:44 -0700, Andrew Morton wrote:
> > 
> > > Anyway, this is all arse-about.  What is the design?  What algorithms
> > > do we need to implement to do this successfully?  Answer me that, then
> > > we can decide upon these implementation details.
> > 
> > Building on the per BDI patches, how about integrating feedback from the
> > full-ness of device queues. That is, when we are happily doing IO and we
> > cannot possibly saturate the active devices (as measured by their queue
> > never reaching 75%?) then we can safely increase the total dirty limit.
> > 
> > OTOH, when even with the per BDI dirty limit the device queue is
> > constantly saturated (contended) we ought to lower the total dirty
> > limit.
> > 
> > Lots of detail here to work out, but does this sound workable?
> 
> It's pretty easy to fill the queues - I'd expect that there are a lot of
> not-very-heavy workloads which cause the kernel to shove a lot of little
> writes into the queue when it visits the blockdev mapping: a shower of
> inodes, directory entries, indirect blocks, etc.  With very little dirty
> memory associated with it.
> 
> But back away further.
> 
> What do we actually want the kernel to *do*?  Stated in terms of "when the
> dirty memory state is A, do B" and "when userspace does C, the kernel should
> do D".
> 
> Top-level statement: "when userspace does anything, the kernel should not
> suck" ;)  Some refinement is needed there.
> 
> I _think_ the problem is basically one of latency: a) writes starving reads
> and b) dirty memory causing page reclaim to stall and c) inter-device
> contention on the global memory limits.

well, I hope to have solved c)... :-)

> Hard.  If the device isn't doing anything else then we can shove data at it
> freely.  If reads (or synchronous writes) come in then perhaps the VM
> should back off and permit dirty memory to go higher.
> 
> The anticipatory scheduler(s) are supposed to fix this.

I must plead ignorance here, I'll try to fill this hole in my
knowledge :-/

> Perhaps our queues are too long - if the VFS _does_ back off, it'll take
> some time for that to have an effect.
> 
> Perhaps the fact that the queue size knows nothing about the _size_ of the
> requests in the queue is a problem.

Yes this would be an issue. By not knowing that, the queue limit is
basically useless. It doesn't limit very much.

> Back away even further here.
> 
> What user-visible problem(s) are we attemping to fix?

Good point, the only report so far is DaveM saying git sucked on his
sparc64 box...

Dave, do you have any idea what caused that? could it be this global
fsync ext3 suffers from?

But the basic problem is balancing under-utilisation of disks vs.
keeping too much in memory.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/