[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20120523153718.b70bb762.akpm@linux-foundation.org>
Date: Wed, 23 May 2012 15:37:18 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: David Rientjes <rientjes@...gle.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Dave Jones <davej@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [patch v2] mm, oom: normalize oom scores to oom_score_adj scale
only for userspace
On Wed, 23 May 2012 00:15:03 -0700 (PDT)
David Rientjes <rientjes@...gle.com> wrote:
> The oom_score_adj scale ranges from -1000 to 1000 and represents the
> proportion of memory available to the process at allocation time. This
> means an oom_score_adj value of 300, for example, will bias a process as
> though it was using an extra 30.0% of available memory and a value of
> -350 will discount 35.0% of available memory from its usage.
>
> The oom killer badness heuristic also uses this scale to report the oom
> score for each eligible process in determining the "best" process to
> kill. Thus, it can only differentiate each process's memory usage by
> 0.1% of system RAM.
>
> On large systems, this can end up being a large amount of memory: 256MB
> on 256GB systems, for example.
>
> This can be fixed by having the badness heuristic to use the actual
> memory usage in scoring threads and then normalizing it to the
> oom_score_adj scale for userspace. This results in better comparison
> between eligible threads for kill and no change from the userspace
> perspective.
>
> ...
>
> @@ -367,12 +354,13 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
> }
>
> points = oom_badness(p, memcg, nodemask, totalpages);
> - if (points > *ppoints) {
> + if (points > chosen_points) {
> chosen = p;
> - *ppoints = points;
> + chosen_points = points;
> }
> } while_each_thread(g, p);
>
> + *ppoints = chosen_points * 1000 / totalpages;
> return chosen;
> }
>
It's still not obvious that we always avoid the divide-by-zero here.
If there's some weird way of convincing constrained_alloc() to look at
an empty nodemask, or a nodemask which covers only empty nodes then
blam.
Now, it's probably the case that this is a can't-happen but that
guarantee would be pretty convoluted and fragile?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists