[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170825105728.GA10438@castle.DHCP.thefacebook.com>
Date: Fri, 25 Aug 2017 11:57:28 +0100
From: Roman Gushchin <guro@...com>
To: David Rientjes <rientjes@...gle.com>
CC: <linux-mm@...ck.org>, Michal Hocko <mhocko@...nel.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Johannes Weiner <hannes@...xchg.org>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Tejun Heo <tj@...nel.org>, <kernel-team@...com>,
<cgroups@...r.kernel.org>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [v6 2/4] mm, oom: cgroup-aware OOM killer
Hi David!
On Wed, Aug 23, 2017 at 04:19:11PM -0700, David Rientjes wrote:
> On Wed, 23 Aug 2017, Roman Gushchin wrote:
>
> > Traditionally, the OOM killer is operating on a process level.
> > Under oom conditions, it finds a process with the highest oom score
> > and kills it.
> >
> > This behavior doesn't suit well the system with many running
> > containers:
> >
> > 1) There is no fairness between containers. A small container with
> > few large processes will be chosen over a large one with huge
> > number of small processes.
> >
> > 2) Containers often do not expect that some random process inside
> > will be killed. In many cases much safer behavior is to kill
> > all tasks in the container. Traditionally, this was implemented
> > in userspace, but doing it in the kernel has some advantages,
> > especially in a case of a system-wide OOM.
> >
> > 3) Per-process oom_score_adj affects global OOM, so it's a breache
> > in the isolation.
> >
> > To address these issues, cgroup-aware OOM killer is introduced.
> >
> > Under OOM conditions, it tries to find the biggest memory consumer,
> > and free memory by killing corresponding task(s). The difference
> > the "traditional" OOM killer is that it can treat memory cgroups
> > as memory consumers as well as single processes.
> >
> > By default, it will look for the biggest leaf cgroup, and kill
> > the largest task inside.
> >
> > But a user can change this behavior by enabling the per-cgroup
> > oom_kill_all_tasks option. If set, it causes the OOM killer treat
> > the whole cgroup as an indivisible memory consumer. In case if it's
> > selected as on OOM victim, all belonging tasks will be killed.
> >
>
> I'm very happy with the rest of the patchset, but I feel that I must renew
> my objection to memory.oom_kill_all_tasks being able to override the
> setting of the admin of setting a process to be oom disabled. From my
> perspective, setting memory.oom_kill_all_tasks with an oom disabled
> process attached that now becomes killable either (1) overrides the
> CAP_SYS_RESOURCE oom disabled setting or (2) is lazy and doesn't modify
> /proc/pid/oom_score_adj itself.
Changed this in v7 (to be posted soon).
Thanks!
Roman
Powered by blists - more mailing lists