[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1002160043530.17122@chino.kir.corp.google.com>
Date: Tue, 16 Feb 2010 00:46:53 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
cc: Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Nick Piggin <npiggin@...e.de>,
Andrea Arcangeli <aarcange@...hat.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Lubos Lunak <l.lunak@...e.cz>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [patch 1/7 -mm] oom: filter tasks not sharing the same cpuset
On Tue, 16 Feb 2010, KOSAKI Motohiro wrote:
> > We now determine whether an allocation is constrained by a cpuset by
> > iterating through the zonelist and checking
> > cpuset_zone_allowed_softwall(). This checks for the necessary cpuset
> > restrictions that we need to validate (the GFP_ATOMIC exception is
> > irrelevant, we don't call into the oom killer for those). We don't need
> > to kill outside of its cpuset because we're not guaranteed to find any
> > memory on those nodes, in fact it allows for needless oom killing if a
> > task sets all of its threads to have OOM_DISABLE in its own cpuset and
> > then runs out of memory. The oom killer would have killed every other
> > user task on the system even though the offending application can't
> > allocate there. That's certainly an undesired result and needs to be
> > fixed in this manner.
>
> But this explanation is irrelevant and meaningless. CPUSET can change
> restricted node dynamically. So, the tsk->mempolicy at oom time doesn't
> represent the place of task's usage memory. plus, OOM_DISABLE can
> always makes undesirable result. it's not special in this case.
>
It depends whether memory_migrate is set or not when changing a cpuset's
set of mems. The point is that we cannot penalize tasks in cpusets with a
disjoint set of mems because another cpuset is out of memory. Unless a
candidate task will definitely free memory on a node that the zonelist
allows, we should not consider it because it may needlessly kill that
task, it would be better to kill current. Otherwise, our badness()
heuristic cannot possibly determine the optimal task to kill, anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists