[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1312041742560.20115@chino.kir.corp.google.com>
Date: Wed, 4 Dec 2013 17:49:04 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Johannes Weiner <hannes@...xchg.org>
cc: Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.cz>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
Pekka Enberg <penberg@...nel.org>,
Christoph Lameter <cl@...ux-foundation.org>,
Tejun Heo <tj@...nel.org>, Li Zefan <lizefan@...wei.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
cgroups@...r.kernel.org
Subject: Re: [patch 7/8] mm, memcg: allow processes handling oom notifications
to access reserves
On Wed, 4 Dec 2013, Johannes Weiner wrote:
> > Now that a per-process flag is available, define it for processes that
> > handle userspace oom notifications. This is an optimization to avoid
> > mantaining a list of such processes attached to a memcg at any given time
> > and iterating it at charge time.
> >
> > This flag gets set whenever a process has registered for an oom
> > notification and is cleared whenever it unregisters.
> >
> > When memcg reclaim has failed to free any memory, it is necessary for
> > userspace oom handlers to be able to dip into reserves to pagefault text,
> > allocate kernel memory to read the "tasks" file, allocate heap, etc.
>
> The task handling the OOM of a memcg can obviously not be part of that
> same memcg.
>
Not without memory.oom_reserve_in_bytes that this series adds, that's
true. Michal expressed interest in the idea of memcg oom reserves in the
past, so I thought I'd share the series.
> On Tue, 3 Dec 2013 at 15:35:48 +0800, Li Zefan wrote:
> > On Mon, 2 Dec 2013 at 11:44:06 -0500, Johannes Weiner wrote:
> > > On Fri, Nov 29, 2013 at 03:05:25PM -0500, Tejun Heo wrote:
> > > > Whoa, so we support oom handler inside the memcg that it handles?
> > > > Does that work reliably? Changing the above detail in this patch
> > > > isn't difficult (and we'll later need to update kernfs too) but
> > > > supporting such setup properly would be a *lot* of commitment and I'm
> > > > very doubtful we'd be able to achieve that by just carefully avoiding
> > > > memory allocation in the operations that usreland oom handler uses -
> > > > that set is destined to expand over time, extremely fragile and will
> > > > be hellish to maintain.
> > > >
It works reliably with this patch series, yes. I'm not sure what change
this is referring to that would avoid memory allocation for userspace oom
handlers, and I'd agree that it would be difficult to maintain a
no-allocation policy for a subset of processes that are destined to handle
oom handlers.
That's not what this series is addressing, though, and in fact it's quite
the opposite. It acknowledges that userspace oom handlers need to
allocate and that anything else would be too difficult to maintain
(thereby agreeing with the above), so we must set aside memory that they
are exclusively allowed to access. For the vast majority of users who
will not use userspace oom handlers, they can just use the default value
of memory.oom_reserve_in_bytes == 0 and they incur absolutely no side-
effects as a result of this series.
For those who do use userspace oom handlers, like Google, this allows us
to set aside memory to allow the userspace oom handlers to kill a process,
dump the heap, send a signal, drop caches, etc. when waking up.
> > > > So, I'm not at all excited about commiting to this guarantee. This
> > > > one is an easy one but it looks like the first step onto dizzying
> > > > slippery slope.
> > > >
> > > > Am I misunderstanding something here? Are you and Johannes firm on
> > > > supporting this?
> > >
> > > Handling a memcg OOM from userspace running inside that OOM memcg is
> > > completely crazy. I mean, think about this for just two seconds...
> > > Really?
> > >
> > > I get that people are doing it right now, and if you can get away with
> > > it for now, good for you. But you have to be aware how crazy this is
> > > and if it breaks you get to keep the pieces and we are not going to
> > > accomodate this in the kernel. Fix your crazy userspace.
> >
The rest of this email communicates only one thing: someone thinks it's
crazy. And I agree it would be crazy if we don't allow that class of
process to have access to a pre-defined amount of memory to handle the
situation, which this series adds.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists