linux-kernel - Re: [PATCH] mm: add config option to select the initial overcommit mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160519071426.GC26110@dhcp22.suse.cz>
Date:	Thu, 19 May 2016 09:14:26 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	Sebastian Frias <sf84@...oste.net>
Cc:	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
	Mason <slash.tmp@...e.fr>, linux-mm@...ck.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, bsingharora@...il.com
Subject: Re: [PATCH] mm: add config option to select the initial overcommit
 mode

On Wed 18-05-16 17:18:45, Sebastian Frias wrote:
> Hi Michal,
> 
> On 05/17/2016 10:16 PM, Michal Hocko wrote:
> > On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
[...]
> > The global OOM means there is _no_ memory at all. Many kernel
> > operations will need some memory to do something useful. Let's say you
> > would want to do an educated guess about who to kill - most proc APIs
> > will need to allocate. And this is just a beginning. Things are getting
> > really nasty when you get deeper and deeper. E.g. the OOM killer has to
> > give the oom victim access to memory reserves so that the task can exit
> > because that path needs to allocate as well. 
> 
> Really? I would have thought that once that SIGKILL is sent, the
> victim process is not expected to do anything else and thus its
> memory could be claimed immediately.  Or the OOM-killer is more of a
> OOM-terminator? (i.e.: sends SIGTERM)

Well, the path to exit is not exactly trivial. Resources have to be
released and that requires memory sometimes. E.g. exit_robust_list
needs to access the futex and that in turn means a page fault if the
memory was swapped out...
 
> >So even if you wanted to
> > give userspace some chance to resolve the OOM situation you would either
> > need some special API to tell "this process is really special and it can
> > access memory reserves and it has an absolute priority etc." or have a
> > in kernel fallback to do something or your system could lockup really
> > easily.
> > 
> 
> I see, so basically at least two cgroups would be needed, one reserved
> for handling the OOM situation through some API and another for the
> "rest of the system".  Basically just like the 5% reserved for 'root'
> on filesystems.

If you want to handle memcg OOM then you can use memory.oom_control (see
Documentation/cgroup-v1/memory.txt for more information) and have the
oom handler outside of that memcg.

> Do you think that would work?

But handling the _global_ oom from userspace is just insane with the
current kernel implementation. It just cannot work reliably.
-- 
Michal Hocko
SUSE Labs