linux-kernel - Re: [patch] mm, oom: make a last minute check to prevent unnecessary memcg oom kills

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200318095514.GF21362@dhcp22.suse.cz>
Date:   Wed, 18 Mar 2020 10:55:14 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Robert Kolchmeyer <rkolchmeyer@...gle.com>
Cc:     David Rientjes <rientjes@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, Ami Fischman <fischman@...gle.com>
Subject: Re: [patch] mm, oom: make a last minute check to prevent unnecessary
 memcg oom kills

On Tue 17-03-20 11:25:52, Robert Kolchmeyer wrote:
> On Tue, Mar 10, 2020 at 3:54 PM David Rientjes <rientjes@...gle.com> wrote:
> >
> > Robert, could you elaborate on the user-visible effects of this issue that
> > caused it to initially get reported?
> >
> 
> Ami (now cc'ed) knows more, but here is my understanding. The use case
> involves a Docker container running multiple processes. The container
> has a memory limit set. The container contains two long-lived,
> important processes p1 and p2, and some arbitrary, dynamic number of
> usually ephemeral processes p3,...,pn. These processes are structured
> in a hierarchy that looks like p1->p2->[p3,...,pn]; p1 is a parent of
> p2, and p2 is the parent for all of the ephemeral processes p3,...,pn.
> 
> Since p1 and p2 are long-lived and important, the user does not want
> p1 and p2 to be oom-killed. However, p3,...,pn are expected to use a
> lot of memory, and it's ok for those processes to be oom-killed.
> 
> If the user sets oom_score_adj on p1 and p2 to make them very unlikely
> to be oom-killed, p3,...,pn will inherit the oom_score_adj value,
> which is bad. Additionally, setting oom_score_adj on p3,...,pn is
> tricky, since processes in the Docker container (specifically p1 and
> p2) don't have permissions to set oom_score_adj on p3,...,pn. The
> ephemeral nature of p3,...,pn also makes setting oom_score_adj on them
> tricky after they launch.

Thanks for the clarification.

> So, the user hopes that when one of p3,...,pn triggers an oom
> condition in the Docker container, the oom killer will almost always
> kill processes from p3,...,pn (and not kill p1 or p2, which are both
> important and unlikely to trigger an oom condition). The issue of more
> processes being killed than are strictly necessary is resulting in p1
> or p2 being killed much more frequently when one of p3,...,pn triggers
> an oom condition, and p1 or p2 being killed is very disruptive for the
> user (my understanding is that p1 or p2 going down with high frequency
> results in significant unhealthiness in the user's service).

Do you have any logs showing this condition? I am interested because
from your description it seems like p1/p2 shouldn't be usually those
which trigger the oom, right? That suggests that it should be mostly p3,
... pn to be in the kernel triggering the oom and therefore they
shouldn't vanish.
-- 
Michal Hocko
SUSE Labs