[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131120172119.GA1848@hp530>
Date: Wed, 20 Nov 2013 18:21:23 +0100
From: Vladimir Murzin <murzin.v@...il.com>
To: Michal Hocko <mhocko@...e.cz>
Cc: linux-mm@...ck.org, Greg Thelen <gthelen@...gle.com>,
Glauber Costa <glommer@...il.com>,
Mel Gorman <mgorman@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
David Rientjes <rientjes@...gle.com>,
Rik van Riel <riel@...hat.com>,
Joern Engel <joern@...fs.org>, Hugh Dickins <hughd@...gle.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: user defined OOM policies
On Tue, Nov 19, 2013 at 02:40:07PM +0100, Michal Hocko wrote:
Hi Michal
> On Tue 19-11-13 14:14:00, Michal Hocko wrote:
> [...]
> > We have basically ended up with 3 options AFAIR:
> > 1) allow memcg approach (memcg.oom_control) on the root level
> > for both OOM notification and blocking OOM killer and handle
> > the situation from the userspace same as we can for other
> > memcgs.
>
> This looks like a straightforward approach as the similar thing is done
> on the local (memcg) level. There are several problems though.
> Running userspace from within OOM context is terribly hard to do
> right. This is true even in the memcg case and we strongly discurage
> users from doing that. The global case has nothing like outside of OOM
> context though. So any hang would blocking the whole machine. Even
> if the oom killer is careful and locks in all the resources it would
> have hard time to query the current system state (existing processes
> and their states) without any allocation. There are certain ways to
> workaround these issues - e.g. give the killer access to memory reserves
> - but this all looks scary and fragile.
>
> > 2) allow modules to hook into OOM killer path and take the
> > appropriate action.
>
> This already exists actually. There is oom_notify_list callchain and
> {un}register_oom_notifier that allow modules to hook into oom and
> skip the global OOM if some memory is freed. There are currently only
> s390 and powerpc which seem to abuse it for something that looks like a
> shrinker except it is done in OOM path...
>
> I think the interface should be changed if something like this would be
> used in practice. There is a lot of information lost on the way. I would
> basically expect to get everything that out_of_memory gets.
Some time ago I was trying to hook OOM with custom module based policy. I
needed to select process based on uss/pss values which required page walking
(yes, I know it is extremely expensive, but sometimes I'd pay the bill). The
learned lesson is quite simple - it is harmful to expose (all?) internal
functions and locking into modules - the result is going to be completely
unreliable and non predictable mess, unless the well defined interface and
helpers will be established.
>
> > 3) create a generic filtering mechanism which could be
> > controlled from the userspace by a set of rules (e.g.
> > something analogous to packet filtering).
>
> This looks generic enough but I have no idea about the complexity.
Never thought about it, but just wonder which input and output supposed to
have for this filtering mechanism?
Vladimir
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists