[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0901141655390.22699@chino.kir.corp.google.com>
Date: Wed, 14 Jan 2009 16:58:41 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Evgeniy Polyakov <zbr@...emap.net>
cc: linux-kernel@...r.kernel.org, Bryan Donlan <bdonlan@...il.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Dave Jones <davej@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Theodore Tso <tytso@....edu>,
Matthias Andree <matthias.andree@....de>,
Randy Dunlap <randy.dunlap@...cle.com>
Subject: Re: [take3] OOM documentation update [was: Linux killed Kenny,
bastard!]
On Thu, 15 Jan 2009, Evgeniy Polyakov wrote:
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index d105eb4..eed2fbb 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -2311,6 +2311,32 @@ increase the likelihood of this process being killed by the oom-killer. Valid
> values are in the range -16 to +15, plus the special value -17, which disables
> oom-killing altogether for this process.
>
> +The process to be killed in an out-of-memory situation is selected among all others
> +based on its badness score. This value equals the original memory size of the process
> +and is then updated according to its CPU time (utime + stime) and the
> +run time (uptime - start time). The longer it runs the smaller is the score.
> +Badness score is divided by the square root of the CPU time and then by
> +the double square root of the run time.
> +
> +Swapped out tasks are killed first. Half of each child's memory size is added to
> +the parent's score if they do not share the same memory. Thus forking servers
> +are the prime candidates to be killed. Having only one 'hungry' child will make
> +parent less preferable than the child.
> +
> +/proc/<pid>/oom_score shows process' current badness score.
> +
> +The following heuristics are then applied:
> + * if the task was reniced, its score doubles
> + * superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE
> + or CAP_SYS_RAWIO) have their score divided by 4
> + * if oom condition happened in one cpuset and checked task does not belong
> + to it, its score is divided by 8
> + * the resulting score is multiplied by two to the power of oom_adj, i.e.
> + points <<= oom_adj when it is positive and
> + points >>= -(oom_adj) otherwise
> +
> +The task with the highest badness score is then killed.
> +
Not quite, even after a task is selected for oom kill, the oom killer
still prefers to kill one of its children first if any have a different
mm. See oom_kill_process().
You also don't mention the exception of OOM_DISABLE (oom_adj score of -17)
in your formula for how oom_adj impacts the points value. Although its
already explained earlier, it should be mentioned here since a oom_adj is
an int and a right shift of 17 does not guarantee `points' will be 0.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists