[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171004092938.nipd6mtywyy4im44@dhcp22.suse.cz>
Date: Wed, 4 Oct 2017 11:29:38 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Tejun Heo <tj@...nel.org>
Cc: Roman Gushchin <guro@...com>, linux-mm@...ck.org,
Vladimir Davydov <vdavydov.dev@...il.com>,
Johannes Weiner <hannes@...xchg.org>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
David Rientjes <rientjes@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>, kernel-team@...com,
cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [v9 3/5] mm, oom: cgroup-aware OOM killer
On Tue 03-10-17 07:35:59, Tejun Heo wrote:
> Hello, Michal.
>
> On Tue, Oct 03, 2017 at 04:22:46PM +0200, Michal Hocko wrote:
> > On Tue 03-10-17 15:08:41, Roman Gushchin wrote:
> > > On Tue, Oct 03, 2017 at 03:36:23PM +0200, Michal Hocko wrote:
> > [...]
> > > > I guess we want to inherit the value on the memcg creation but I agree
> > > > that enforcing parent setting is weird. I will think about it some more
> > > > but I agree that it is saner to only enforce per memcg value.
> > >
> > > I'm not against, but we should come up with a good explanation, why we're
> > > inheriting it; or not inherit.
> >
> > Inheriting sounds like a less surprising behavior. Once you opt in for
> > oom_group you can expect that descendants are going to assume the same
> > unless they explicitly state otherwise.
>
> Here's a counter example.
>
> Let's say there's a container which hosts one main application, and
> the container shares its host with other containers.
>
> * Let's say the container is a regular containerized OS instance and
> can't really guarantee system integrity if one its processes gets
> randomly killed.
>
> * However, the application that it's running inside an isolated cgroup
> is more intelligent and composed of multiple interchangeable
> processes and can treat killing of a random process as partial
> capacity loss.
>
> When the host is setting up the outer container, it doesn't
> necessarily know whether the containerized environment would be able
> to handle partial OOM kills or not. It's akin to panic_on_oom setting
> at system level - it's the containerized instance itself which knows
> whether it can handle partial OOM kills or not. This is why this knob
> should be delegatable.
>
> Now, the container itself has group OOM set and the isolated main
> application is starting up. It obviously wants partial OOM kills
> rather than group killing. This is the same principle. The
> application which is being contained in the cgroup is the one which
> knows how it can handle OOM conditions, not the outer environment, so
> it obviously needs to be able to set the configuration it wants.
Yes this makes a lot of sense. On the other hand we used to copy other
reclaim specific atributes like swappiness and oom_kill_disable.
I guess we should be OK with "non-hierarchical" behavior when it is
documented properly so that there are surpasses.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists