[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100215120645.727E.A69D9226@jp.fujitsu.com>
Date: Mon, 15 Feb 2010 12:08:34 +0900 (JST)
From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To: David Rientjes <rientjes@...gle.com>
Cc: kosaki.motohiro@...fujitsu.com,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Nick Piggin <npiggin@...e.de>,
Andrea Arcangeli <aarcange@...hat.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Lubos Lunak <l.lunak@...e.cz>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [patch 2/7 -mm] oom: sacrifice child with highest badness score for parent
> When a task is chosen for oom kill, the oom killer first attempts to
> sacrifice a child not sharing its parent's memory instead.
> Unfortunately, this often kills in a seemingly random fashion based on
> the ordering of the selected task's child list. Additionally, it is not
> guaranteed at all to free a large amount of memory that we need to
> prevent additional oom killing in the very near future.
>
> Instead, we now only attempt to sacrifice the worst child not sharing its
> parent's memory, if one exists. The worst child is indicated with the
> highest badness() score. This serves two advantages: we kill a
> memory-hogging task more often, and we allow the configurable
> /proc/pid/oom_adj value to be considered as a factor in which child to
> kill.
>
> Reviewers may observe that the previous implementation would iterate
> through the children and attempt to kill each until one was successful
> and then the parent if none were found while the new code simply kills
> the most memory-hogging task or the parent. Note that the only time
> oom_kill_task() fails, however, is when a child does not have an mm or
> has a /proc/pid/oom_adj of OOM_DISABLE. badness() returns 0 for both
> cases, so the final oom_kill_task() will always succeed.
>
> Signed-off-by: David Rientjes <rientjes@...gle.com>
Probably, kamezawa-san talked about right thing. but this patch is
enough small and it have no regression risk. So, we can choice step-by-step
development.
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
> ---
> mm/oom_kill.c | 23 +++++++++++++++++------
> 1 files changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -432,7 +432,10 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> unsigned long points, struct mem_cgroup *mem,
> const char *message)
> {
> + struct task_struct *victim = p;
> struct task_struct *c;
> + unsigned long victim_points = 0;
> + struct timespec uptime;
>
> if (printk_ratelimit())
> dump_header(p, gfp_mask, order, mem);
> @@ -446,17 +449,25 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> return 0;
> }
>
> - printk(KERN_ERR "%s: kill process %d (%s) score %li or a child\n",
> - message, task_pid_nr(p), p->comm, points);
> + pr_err("%s: Kill process %d (%s) with score %lu or sacrifice child\n",
> + message, task_pid_nr(p), p->comm, points);
>
> - /* Try to kill a child first */
> + /* Try to sacrifice the worst child first */
> + do_posix_clock_monotonic_gettime(&uptime);
> list_for_each_entry(c, &p->children, sibling) {
> + unsigned long cpoints;
> +
> if (c->mm == p->mm)
> continue;
> - if (!oom_kill_task(c))
> - return 0;
> +
> + /* badness() returns 0 if the thread is unkillable */
> + cpoints = badness(c, uptime.tv_sec);
> + if (cpoints > victim_points) {
> + victim = c;
> + victim_points = cpoints;
> + }
> }
> - return oom_kill_task(p);
> + return oom_kill_task(victim);
> }
>
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists