[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1305301338430.20389@chino.kir.corp.google.com>
Date: Thu, 30 May 2013 13:47:30 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Michal Hocko <mhocko@...e.cz>
cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
cgroups@...r.kernel.org
Subject: Re: [patch] mm, memcg: add oom killer delay
On Thu, 30 May 2013, Michal Hocko wrote:
> > Completely disabling the oom killer for a memcg is problematic if
> > userspace is unable to address the condition itself, usually because it
> > is unresponsive.
>
> Isn't this a bug in the userspace oom handler? Why is it unresponsive? It
> shouldn't allocated any memory so nothing should prevent it from running (if
> other tasks are preempting it permanently then the priority of the handler
> should be increased).
>
Unresponsiveness isn't necessarily only because of memory constraints, you
may have your oom notifier in a parent cgroup that isn't oom. If a
process is stuck on mm->mmap_sem in the oom cgroup, though, the oom
notifier may not be able to scrape /proc/pid and attain necessary
information in making an oom kill decision. If the oom notifier is in the
oom cgroup, it may not be able to successfully read the memcg "tasks"
file to even determine the set of eligible processes. There is also no
guarantee that the userspace oom handler will have the necessary memory to
even re-enable the oom killer in the memcg under oom which would allow the
kernel to make forward progress.
We've used this for a few years as a complement to oom notifiers so that a
process would have a grace period to deal with the oom condition itself
before allowing the kernel to terminate a process and free memory. We've
simply had no alternative in the presence of kernel constraints that
prevent it from being done in any other way. We _want_ userspace to deal
with the issue but when it cannot collect the necessary information (and
we're not tracing every fork() that every process in a potentially oom
memcg does) to deal with the condition, we want the kernel to step in
instead of relying on an admin to login or a global oom condition.
If you'd like to debate this issue, I'd be more than happy to do so and
show why this patch is absolutely necessary for inclusion, but I'd ask
that you'd present the code from your userspace oom handler so I can
understand how it works without needing such backup support.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists