linux-kernel - Re: [v11 3/6] mm, oom: cgroup-aware OOM killer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171013133219.GA5363@castle.DHCP.thefacebook.com>
Date:   Fri, 13 Oct 2017 14:32:19 +0100
From:   Roman Gushchin <guro@...com>
To:     David Rientjes <rientjes@...gle.com>
CC:     <linux-mm@...ck.org>, Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tejun Heo <tj@...nel.org>, <kernel-team@...com>,
        <cgroups@...r.kernel.org>, <linux-doc@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [v11 3/6] mm, oom: cgroup-aware OOM killer

On Thu, Oct 12, 2017 at 02:50:38PM -0700, David Rientjes wrote:
> On Wed, 11 Oct 2017, Roman Gushchin wrote:
> 
> Think about it in a different way: we currently compare per-process usage 
> and userspace has /proc/pid/oom_score_adj to adjust that usage depending 
> on priorities of that process and still oom kill if there's a memory leak.  
> Your heuristic compares per-cgroup usage, it's the cgroup-aware oom killer 
> after all.  We don't need a strict memory.oom_priority that outranks all 
> other sibling cgroups regardless of usage.  We need a memory.oom_score_adj 
> to adjust the per-cgroup usage.  The decisionmaking in your earlier 
> example would be under the control of C/memory.oom_score_adj and 
> D/memory.oom_score_adj.  Problem solved.
> 
> It also solves the problem of userspace being able to influence oom victim 
> selection so now they can protect important cgroups just like we can 
> protect important processes today.
> 
> And since this would be hierarchical usage, you can trivially infer root 
> mem cgroup usage by subtraction of top-level mem cgroup usage.
> 
> This is a powerful solution to the problem and gives userspace the control 
> they need so that it can work in all usecases, not a subset of usecases.

You're right that per-cgroup oom_score_adj may resolve the issue with
too strict semantics of oom_priorities. But I believe nobody likes
the existing per-process oom_score_adj interface, and there are reasons behind.
Especially in case of memcg-OOM, getting the idea how exactly oom_score_adj
will work is not trivial.
For example, earlier in this thread I've shown an example, when a decision
which of two processes should be killed depends on whether it's global or
memcg-wide oom, despite both belong to a single cgroup!

Of course, it's technically trivial to implement some analog of oom_score_adj
for cgroups (and early versions of this patchset did that).
But the right question is: is this an interface we want to support
for the next many years? I'm not sure.