[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160519071426.GC26110@dhcp22.suse.cz>
Date: Thu, 19 May 2016 09:14:26 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Sebastian Frias <sf84@...oste.net>
Cc: One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
Mason <slash.tmp@...e.fr>, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, bsingharora@...il.com
Subject: Re: [PATCH] mm: add config option to select the initial overcommit
mode
On Wed 18-05-16 17:18:45, Sebastian Frias wrote:
> Hi Michal,
>
> On 05/17/2016 10:16 PM, Michal Hocko wrote:
> > On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
[...]
> > The global OOM means there is _no_ memory at all. Many kernel
> > operations will need some memory to do something useful. Let's say you
> > would want to do an educated guess about who to kill - most proc APIs
> > will need to allocate. And this is just a beginning. Things are getting
> > really nasty when you get deeper and deeper. E.g. the OOM killer has to
> > give the oom victim access to memory reserves so that the task can exit
> > because that path needs to allocate as well.
>
> Really? I would have thought that once that SIGKILL is sent, the
> victim process is not expected to do anything else and thus its
> memory could be claimed immediately. Or the OOM-killer is more of a
> OOM-terminator? (i.e.: sends SIGTERM)
Well, the path to exit is not exactly trivial. Resources have to be
released and that requires memory sometimes. E.g. exit_robust_list
needs to access the futex and that in turn means a page fault if the
memory was swapped out...
> >So even if you wanted to
> > give userspace some chance to resolve the OOM situation you would either
> > need some special API to tell "this process is really special and it can
> > access memory reserves and it has an absolute priority etc." or have a
> > in kernel fallback to do something or your system could lockup really
> > easily.
> >
>
> I see, so basically at least two cgroups would be needed, one reserved
> for handling the OOM situation through some API and another for the
> "rest of the system". Basically just like the 5% reserved for 'root'
> on filesystems.
If you want to handle memcg OOM then you can use memory.oom_control (see
Documentation/cgroup-v1/memory.txt for more information) and have the
oom handler outside of that memcg.
> Do you think that would work?
But handling the _global_ oom from userspace is just insane with the
current kernel implementation. It just cannot work reliably.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists