[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180126143950.719912507bd993d92188877f@linux-foundation.org>
Date: Fri, 26 Jan 2018 14:39:50 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: David Rientjes <rientjes@...gle.com>
Cc: Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...nel.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Johannes Weiner <hannes@...xchg.org>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Tejun Heo <tj@...nel.org>, kernel-team@...com,
cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch -mm v2 2/3] mm, memcg: replace cgroup aware oom killer
mount option with tunable
On Fri, 26 Jan 2018 14:20:24 -0800 (PST) David Rientjes <rientjes@...gle.com> wrote:
> On Thu, 25 Jan 2018, Andrew Morton wrote:
>
> > > Now that each mem cgroup on the system has a memory.oom_policy tunable to
> > > specify oom kill selection behavior, remove the needless "groupoom" mount
> > > option that requires (1) the entire system to be forced, perhaps
> > > unnecessarily, perhaps unexpectedly, into a single oom policy that
> > > differs from the traditional per process selection, and (2) a remount to
> > > change.
> > >
> > > Instead of enabling the cgroup aware oom killer with the "groupoom" mount
> > > option, set the mem cgroup subtree's memory.oom_policy to "cgroup".
> >
> > Can we retain the groupoom mount option and use its setting to set the
> > initial value of every memory.oom_policy? That way the mount option
> > remains somewhat useful and we're back-compatible?
> >
>
> -ECONFUSED. We want to have a mount option that has the sole purpose of
> doing echo cgroup > /mnt/cgroup/memory.oom_policy?
Approximately. Let me put it another way: can we modify your patchset
so that the mount option remains, and continues to have a sufficiently
same effect? For backward compatibility.
> This, and fixes to fairly compare the root mem cgroup with leaf mem
> cgroups, are essential before the feature is merged otherwise it yields
> wildly unpredictable (and unexpected, since its interaction with
> oom_score_adj isn't documented) results as I already demonstrated where
> cgroups with 1GB of usage are killed instead of 6GB workers outside of
> that subtree.
OK, so Roman's new feature is incomplete: it satisfies some use cases
but not others. And we kinda have a plan to address the other use
cases in the future.
There's nothing wrong with that! As long as we don't break existing
setups while evolving the feature. How do we do that?
Powered by blists - more mailing lists