linux-kernel - Re: [PATCH] fs, mm: account filp and names caches to kmemcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171024185854.GA6154@cmpxchg.org>
Date:   Tue, 24 Oct 2017 14:58:54 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Greg Thelen <gthelen@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>, linux-fsdevel@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg

On Tue, Oct 24, 2017 at 07:55:58PM +0200, Michal Hocko wrote:
> On Tue 24-10-17 13:23:30, Johannes Weiner wrote:
> > On Tue, Oct 24, 2017 at 06:22:13PM +0200, Michal Hocko wrote:
> [...]
> > > What would prevent a runaway in case the only process in the memcg is
> > > oom unkillable then?
> > 
> > In such a scenario, the page fault handler would busy-loop right now.
> > 
> > Disabling oom kills is a privileged operation with dire consequences
> > if used incorrectly. You can panic the kernel with it. Why should the
> > cgroup OOM killer implement protective semantics around this setting?
> > Breaching the limit in such a setup is entirely acceptable.
> > 
> > Really, I think it's an enormous mistake to start modeling semantics
> > based on the most contrived and non-sensical edge case configurations.
> > Start the discussion with what is sane and what most users should
> > optimally experience, and keep the cornercases simple.
> 
> I am not really seeing your concern about the semantic. The most
> important property of the hard limit is to protect from runaways and
> stop them if they happen. Users can use the softer variant (high limit)
> if they are not afraid of those scenarios. It is not so insane to
> imagine that a master task (which I can easily imagine would be oom
> disabled) has a leak and runaway as a result.

Then you're screwed either way. Where do you return -ENOMEM in a page
fault path that cannot OOM kill anything? Your choice is between
maintaining the hard limit semantics or going into an infinite loop.

I fail to see how this setup has any impact on the semantics we pick
here. And even if it were real, it's really not what most users do.

> We are not talking only about the page fault path. There are other
> allocation paths to consume a lot of memory and spill over and break
> the isolation restriction. So it makes much more sense to me to fail
> the allocation in such a situation rather than allow the runaway to
> continue. Just consider that such a situation shouldn't happen in
> the first place because there should always be an eligible task to
> kill - who would own all the memory otherwise?

Okay, then let's just stick to the current behavior.