[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131120172511.GB1848@hp530>
Date: Wed, 20 Nov 2013 18:25:16 +0100
From: Vladimir Murzin <murzin.v@...il.com>
To: David Rientjes <rientjes@...gle.com>
Cc: Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org,
Greg Thelen <gthelen@...gle.com>,
Glauber Costa <glommer@...il.com>,
Mel Gorman <mgorman@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Rik van Riel <riel@...hat.com>,
Joern Engel <joern@...fs.org>, Hugh Dickins <hughd@...gle.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: user defined OOM policies
Hi David
On Wed, Nov 20, 2013 at 12:02:20AM -0800, David Rientjes wrote:
> On Tue, 19 Nov 2013, Michal Hocko wrote:
>
> > > We have basically ended up with 3 options AFAIR:
> > > 1) allow memcg approach (memcg.oom_control) on the root level
> > > for both OOM notification and blocking OOM killer and handle
> > > the situation from the userspace same as we can for other
> > > memcgs.
> >
> > This looks like a straightforward approach as the similar thing is done
> > on the local (memcg) level. There are several problems though.
> > Running userspace from within OOM context is terribly hard to do
> > right.
>
> Not sure it's hard if you have per-memcg memory reserves which I've
> brought up in the past with true and complete kmem accounting. Even if
> you don't allocate slab, it guarantees that there will be at least a
> little excess memory available so that the userspace oom handler isn't oom
> itself.
>
> This involves treating processes waiting on memory.oom_control to be
> treated as a special class so that they are allowed to allocate an
> additional pre-configured amount of memory. For non-root memcgs, this
> would simply be a dummy usage that would be charged to the memcg when the
> oom notification is registered and actually accessible only by the oom
> handler itself while memcg->under_oom. For root memcgs, this would simply
> be a PF_MEMALLOC type behavior that dips into per-zone memory reserves.
>
> > This is true even in the memcg case and we strongly discurage
> > users from doing that. The global case has nothing like outside of OOM
> > context though. So any hang would blocking the whole machine.
>
> Why would there be a hang if the userspace oom handlers aren't actually
> oom themselves as described above?
>
> I'd suggest against the other two suggestions because hierarchical
> per-memcg userspace oom handlers are very powerful and can be useful
> without actually killing anything at all, and parent oom handlers can
> signal child oom handlers to free memory in oom conditions (in other
> words, defer a parent oom condition to a child's oom handler upon
Is not vmpressure notifications was designed for that purpose?
Vladimir
> notification). I was planning on writing a liboom library that would lay
> the foundation for how this was supposed to work and some generic
> functions that make use of the per-memcg memory reserves.
>
> So my plan for the complete solution was:
>
> - allow userspace notification from the root memcg on system oom
> conditions,
>
> - implement a memory.oom_delay_millisecs timeout so that the kernel
> eventually intervenes if userspace fails to respond, including for
> system oom conditions, for whatever reason which would be set to 0
> if no userspace oom handler is registered for the notification, and
>
> - implement per-memcg reserves as described above so that userspace oom
> handlers have access to memory even in oom conditions as an upfront
> charge and have the ability to free memory as necessary.
>
> We already have the ability to do the actual kill from userspace, both the
> system oom killer and the memcg oom killer grants access to memory
> reserves for any process needing to allocate memory if it has a pending
> SIGKILL which we can send from userspace.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists