linux-kernel - Re: Linux 2.6.29

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20090326180314.ccce1be2.akpm@linux-foundation.org>
Date:	Thu, 26 Mar 2009 18:03:14 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Theodore Tso <tytso@....edu>, David Rees <drees76@...il.com>,
	Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

On Thu, 26 Mar 2009 17:51:44 -0700 (PDT) Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> 
> 
> On Thu, 26 Mar 2009, Linus Torvalds wrote:
> > 
> > The only times tunables have worked for us is when they auto-tune. 
> > 
> > IOW, we don't have "use 35% of memory for buffer cache" tunables, we just 
> > dynamically auto-tune memory use. And no, we don't expect user space to 
> > run some "tuning program for their load" either.
> 
> IOW, what we could reasonably do is something along the lines of:
> 
>  - start off with some reasonable value for max background dirty (per 
>    block device) that defaults to something sane (quite possibly based on 
>    simply memory size).
> 
>  - assume that "foreground dirty" is just always 2* background dirty.
> 
>  - if we hit the "max foreground dirty" during memory allocation, then we 
>    shrink the background dirty value (logic: we never want to have to wait 
>    synchronously)
> 
>  - if we hit some maximum latency on writeback, shrink dirty aggressively 
>    and based on how long the latency was (because at that point we have a 
>    real _measure_ of how costly it is with that load).
> 
>  - if we start doing background dirtying, but never hit the foreground 
>    dirty even in dirty balancing (ie when a writer is actually _writing_, 
>    as opposed to hitting it when allocating memory by a non-writer), then 
>    slowly open up the window - we may be limiting too early.
> 
> .. add heuristics to taste. The point being, that if we do this based on 
> real loads, and based on hitting the real problems, then we might actually 
> be getting somewhere. In particular, if the filesystem sucks at writeout 
> (ie the limiter is not the _disk_, but the filesystem serialization), then 
> it should automatically also shrink the max dirty state. 
> 
> The tunable then could become the maximum latency we accept or something 
> like that. Or the hysteresis limits/rules for the soft "grow" or "shrink" 
> events. At that point, maybe we could even find something that works for 
> most people.
> 

hm.

It may not be too hard to account for seekiness.  Simplest case: if we
dirty a page and that page is file-contiguous to another already dirty
page then don't increment the dirty page count by "1": increment it by
0.01.

Another simple case would be to keep track of the _number_ of dirty
inodes rather than simply lumping all dirty pages together.

And then there's metadata.  The dirty balancing code doesn't account
for dirty inodes _at all_ at present.

(Many years ago there was a bug wherein we could have zillions of dirty
inodes and exactly zero dirty pages, and the writeback code wouldn't
trigger at all - the inodes would just sit there until a page got
dirtied - this might still be there).


Then again, perhaps we don't need all those discrete heuristic things. 
Maybe it can all be done in mark_buffer_dirty().  Do some clever
math+data-structure to track the seekiness of our dirtiness.  Delayed
allocation would mess that up though.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/