[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1002111346050.8809@chino.kir.corp.google.com>
Date: Thu, 11 Feb 2010 13:51:36 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
cc: Rik van Riel <riel@...hat.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Nick Piggin <npiggin@...e.de>,
Andrea Arcangeli <aarcange@...hat.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Lubos Lunak <l.lunak@...e.cz>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [patch 4/7 -mm] oom: badness heuristic rewrite
On Thu, 11 Feb 2010, Andrew Morton wrote:
> > Changing any value that may have a tendency to be hardcoded elsewhere is
> > always controversial, but I think the nature of /proc/pid/oom_adj allows
> > us to do so for two specific reasons:
> >
> > - hardcoded values tend not the fall within a range, they tend to either
> > always prefer a certain task for oom kill first or disable oom killing
> > entirely. The current implementation uses this as a bitshift on a
> > seemingly unpredictable and unscientific heuristic that is very
> > difficult to predict at runtime. This means that fewer and fewer
> > applications would hardcode a value of '8', for example, because its
> > semantics depends entirely on RAM capacity of the system to begin with
> > since badness() scores are only useful when used in comparison with
> > other tasks.
>
> You'd be amazed what dumb things applications do. Get thee to
> http://google.com/codesearch?hl=en&lr=&q=[^a-z]oom_adj[^a-z]&sbtn=Search
> and start reading. All 641 matches ;)
>
> Here's one which which writes -16:
> http://google.com/codesearch/p?hl=en#eN5TNOm7KtI/trunk/wlan/vendor/asus/eeepc/init.rc&q=[^a-z]oom_adj[^a-z]&sa=N&cd=70&ct=rc
>
> Let's not change the ABI please.
>
Sigh, this is going to require the amount of system memory to be
partitioned into OOM_ADJUST_MAX, 15, chunks and that's going to be the
granularity at which we'll be able to either bias or discount memory usage
of individual tasks by: instead of being able to do this with 0.1%
granularity we'll now be limited to 100 / 15, or ~7%. That's ~9GB on my
128GB system just because this was originally a bitshift. The upside is
that it's now linear and not exponential.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists