[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090804191031.6A3D.A69D9226@jp.fujitsu.com>
Date: Tue, 4 Aug 2009 19:25:08 +0900 (JST)
From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: kosaki.motohiro@...fujitsu.com, Paul Menage <menage@...gle.com>,
David Rientjes <rientjes@...gle.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Rik van Riel <riel@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Oleg Nesterov <oleg@...hat.com>, linux-mm <linux-mm@...ck.org>
Subject: [PATCH for 2.6.31 0/4] fix oom_adj regression v2
The commit 2ff05b2b (oom: move oom_adj value) move oom_adj value to mm_struct.
It is very good first step for sanitize OOM.
However Paul Menage reported the commit makes regression to his job scheduler.
Current OOM logic can kill OOM_DISABLED process.
Why? His program has the code of similar to the following.
...
set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
...
if (vfork() == 0) {
set_oom_adj(0); /* Invoked child can be killed */
execve("foo-bar-cmd")
}
....
vfork() parent and child are shared the same mm_struct. then above set_oom_adj(0) doesn't
only change oom_adj for vfork() child, it's also change oom_adj for vfork() parent.
Then, vfork() parent (job scheduler) lost OOM immune and it was killed.
Actually, fork-setting-exec idiom is very frequently used in userland program. We must
not break this assumption.
This patch series are slightly big, but we must fix any regression soon.
Sorting out OOM requirements:
-----------------------
- select_bad_process() must select killable process.
otherwise OOM might makes following livelock.
1. select_bad_process() select unkillable process
2. oom_kill_process() do no-op and return.
3. exit out_of_memory and makes next OOM soon. then, goto 1 again.
- vfork parent and child must not shared oom_adj.
My proposal
-----------------------
- oom_adj become per-process property. it have been documented long time.
but the implementaion was not correct.
- oom_score also become per-process property. it makes oom logic simpler and faster.
- remove bogus vfork() parent killing logic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists