linux-kernel - Re: [v9 3/5] mm, oom: cgroup-aware OOM killer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171004092938.nipd6mtywyy4im44@dhcp22.suse.cz>
Date:   Wed, 4 Oct 2017 11:29:38 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     Roman Gushchin <guro@...com>, linux-mm@...ck.org,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        David Rientjes <rientjes@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>, kernel-team@...com,
        cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [v9 3/5] mm, oom: cgroup-aware OOM killer

On Tue 03-10-17 07:35:59, Tejun Heo wrote:
> Hello, Michal.
> 
> On Tue, Oct 03, 2017 at 04:22:46PM +0200, Michal Hocko wrote:
> > On Tue 03-10-17 15:08:41, Roman Gushchin wrote:
> > > On Tue, Oct 03, 2017 at 03:36:23PM +0200, Michal Hocko wrote:
> > [...]
> > > > I guess we want to inherit the value on the memcg creation but I agree
> > > > that enforcing parent setting is weird. I will think about it some more
> > > > but I agree that it is saner to only enforce per memcg value.
> > > 
> > > I'm not against, but we should come up with a good explanation, why we're
> > > inheriting it; or not inherit.
> > 
> > Inheriting sounds like a less surprising behavior. Once you opt in for
> > oom_group you can expect that descendants are going to assume the same
> > unless they explicitly state otherwise.
> 
> Here's a counter example.
> 
> Let's say there's a container which hosts one main application, and
> the container shares its host with other containers.
> 
> * Let's say the container is a regular containerized OS instance and
>   can't really guarantee system integrity if one its processes gets
>   randomly killed.
> 
> * However, the application that it's running inside an isolated cgroup
>   is more intelligent and composed of multiple interchangeable
>   processes and can treat killing of a random process as partial
>   capacity loss.
> 
> When the host is setting up the outer container, it doesn't
> necessarily know whether the containerized environment would be able
> to handle partial OOM kills or not.  It's akin to panic_on_oom setting
> at system level - it's the containerized instance itself which knows
> whether it can handle partial OOM kills or not.  This is why this knob
> should be delegatable.
> 
> Now, the container itself has group OOM set and the isolated main
> application is starting up.  It obviously wants partial OOM kills
> rather than group killing.  This is the same principle.  The
> application which is being contained in the cgroup is the one which
> knows how it can handle OOM conditions, not the outer environment, so
> it obviously needs to be able to set the configuration it wants.

Yes this makes a lot of sense. On the other hand we used to copy other
reclaim specific atributes like swappiness and oom_kill_disable.

I guess we should be OK with "non-hierarchical" behavior when it is
documented properly so that there are surpasses.

-- 
Michal Hocko
SUSE Labs