[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090115085703.GD29586@ioremap.net>
Date: Thu, 15 Jan 2009 11:57:03 +0300
From: Evgeniy Polyakov <zbr@...emap.net>
To: linux-kernel@...r.kernel.org
Cc: David Rientjes <rientjes@...gle.com>,
Bryan Donlan <bdonlan@...il.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Dave Jones <davej@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Theodore Tso <tytso@....edu>,
Matthias Andree <matthias.andree@....de>,
Randy Dunlap <randy.dunlap@...cle.com>
Subject: [take4] OOM documentation update [was: Linux killed Kenny, bastard!]
Signed-off-by: Evgeniy Polyakov <zbr@...emap.net>
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d105eb4..4902966 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -2311,6 +2311,34 @@ increase the likelihood of this process being killed by the oom-killer. Valid
values are in the range -16 to +15, plus the special value -17, which disables
oom-killing altogether for this process.
+The process to be killed in an out-of-memory situation is selected among all others
+based on its badness score. This value equals the original memory size of the process
+and is then updated according to its CPU time (utime + stime) and the
+run time (uptime - start time). The longer it runs the smaller is the score.
+Badness score is divided by the square root of the CPU time and then by
+the double square root of the run time.
+
+Swapped out tasks are killed first. Half of each child's memory size is added to
+the parent's score if they do not share the same memory. Thus forking servers
+are the prime candidates to be killed. Having only one 'hungry' child will make
+parent less preferable than the child.
+
+/proc/<pid>/oom_score shows process' current badness score.
+
+The following heuristics are then applied:
+ * if the task was reniced, its score doubles
+ * superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE
+ or CAP_SYS_RAWIO) have their score divided by 4
+ * if oom condition happened in one cpuset and checked task does not belong
+ to it, its score is divided by 8
+ * the resulting score is multiplied by two to the power of oom_adj, i.e.
+ points <<= oom_adj when it is positive and
+ points >>= -(oom_adj) otherwise
+
+The task with the highest badness score is then selected and its children
+are killed, process itself will be killed in an OOM situation when it does
+not have children or some of them disabled oom like described above.
+
2.13 /proc/<pid>/oom_score - Display current oom-killer score
-------------------------------------------------------------
--
Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists