linux-kernel - Re: [patch] mm, memcg: add memory.oom_control notification for system oom

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.02.1311141447160.21413@chino.kir.corp.google.com>
Date:	Thu, 14 Nov 2013 14:57:51 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Johannes Weiner <hannes@...xchg.org>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...e.cz>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	cgroups@...r.kernel.org
Subject: Re: [patch] mm, memcg: add memory.oom_control notification for system
 oom

On Wed, 13 Nov 2013, Johannes Weiner wrote:

> > > Somebody called out_of_memory() after they
> > > failed reclaim, the machine is OOM.
> > 
> > While momentarily oom, the oom notifiers in powerpc and s390 have the 
> > ability to free memory without requiring a kill.
> 
> So either
> 
> 1) they should be part of the regular reclaim process, or
> 
> 2) their invocation is severe enough to not be part of reclaim, at
>    which point we should probably tell userspace about the OOM
> 

(1) is already true, we can avoid oom by freeing memory for subsystems 
using register_oom_notifier(), so we're not actually oom.  It's a late 
callback into the kernel to free memory in a sense of reclaim.  It was 
added directly into out_of_memory() purely for simplicity; it could be 
moved to the page allocator if we move all of the oom_notify_list helpers 
there as well.

The same is true of silently setting TIF_MEMDIE for current so that it has 
access to memory reserves and may exit when it has a pending SIGKILL or is 
already exiting.

In both cases, we're not actually oom because either (a) the kernel can 
still free memory and avoid actually killing a process, or (b) current 
simply needs access to memory reserves so it may die.

We don't want to invoke the userspace oom handler when we first enter 
direct reclaim, for example, for the same reason.

> > I think you're misunderstanding the kernel oom notifiers, they exist 
> > solely to free memory so that the oom killer actually doesn't have to kill 
> > anything.  The fact that they use kernel notifiers is irrelevant and 
> > userspace oom notification is separate.  Userspace is only going to want a 
> > notification when the oom killer has to kill something, the EXACT same 
> > semantics as the non-root-memcg memory.oom_control.
> 
> That's actually not true, we invoke the OOM notifier before calling
> mem_cgroup_out_of_memory(), which then may skip the kill in favor of
> letting current exit.  It does this for when the kernel handler is
> enabled, which would be the equivalent for what you are implementing.
> 

Good point, I don't think we should be notifying userspace for memcg oom 
conditions when current simply needs access to memory reserves to exit: 
the memcg isn't actually oom since TIF_MEMDIE implies memcg bypass.  I 
think we should do that in mem_cgroup_handle_oom() rather than 
mem_cgroup_out_of_memory().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/