linux-kernel - Re: [v8 0/4] cgroup-aware OOM killer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170926105925.GA23139@castle.dhcp.TheFacebook.com>
Date:   Tue, 26 Sep 2017 11:59:25 +0100
From:   Roman Gushchin <guro@...com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     Johannes Weiner <hannes@...xchg.org>, Tejun Heo <tj@...nel.org>,
        <kernel-team@...com>, David Rientjes <rientjes@...gle.com>,
        <linux-mm@...ck.org>, Vladimir Davydov <vdavydov.dev@...il.com>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Andrew Morton <akpm@...ux-foundation.org>,
        <cgroups@...r.kernel.org>, <linux-doc@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [v8 0/4] cgroup-aware OOM killer

On Mon, Sep 25, 2017 at 10:25:21PM +0200, Michal Hocko wrote:
> On Mon 25-09-17 19:15:33, Roman Gushchin wrote:
> [...]
> > I'm not against this model, as I've said before. It feels logical,
> > and will work fine in most cases.
> > 
> > In this case we can drop any mount/boot options, because it preserves
> > the existing behavior in the default configuration. A big advantage.
> 
> I am not sure about this. We still need an opt-in, ragardless, because
> selecting the largest process from the largest memcg != selecting the
> largest task (just consider memcgs with many processes example).

As I understand Johannes, he suggested to compare individual processes with
group_oom mem cgroups. In other words, always select a killable entity with
the biggest memory footprint.

This is slightly different from my v8 approach, where I treat leaf memcgs
as indivisible memory consumers independent on group_oom setting, so
by default I'm selecting the biggest task in the biggest memcg.

While the approach suggested by Johannes looks clear and reasonable,
I'm slightly concerned about possible implementation issues,
which I've described below:

> 
> > The only thing, I'm slightly concerned, that due to the way how we calculate
> > the memory footprint for tasks and memory cgroups, we will have a number
> > of weird edge cases. For instance, when putting a single process into
> > the group_oom memcg will alter the oom_score significantly and result
> > in significantly different chances to be killed. An obvious example will
> > be a task with oom_score_adj set to any non-extreme (other than 0 and -1000)
> > value, but it can also happen in case of constrained alloc, for instance.
> 
> I am not sure I understand. Are you talking about root memcg comparing
> to other memcgs?

Not only, but root memcg in this case will be another complication. We can
also use the same trick for all memcg (define memcg oom_score as maximum oom_score
of the belonging tasks), it will turn group_oom into pure container cleanup
solution, without changing victim selection algorithm

But, again, I'm not against approach suggested by Johannes. I think that overall
it's the best possible semantics, if we're not taking some implementation details
into account.