[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1401091324120.31538@chino.kir.corp.google.com>
Date: Thu, 9 Jan 2014 13:34:24 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
cc: Michal Hocko <mhocko@...e.cz>,
Johannes Weiner <hannes@...xchg.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
cgroups@...r.kernel.org,
"Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: [patch 1/2] mm, memcg: avoid oom notification when current needs
access to memory reserves
On Tue, 7 Jan 2014, Andrew Morton wrote:
> I just spent a happy half hour reliving this thread and ended up
> deciding I agreed with everyone! I appears that many more emails are
> needed so I think I'll drop
> http://ozlabs.org/~akpm/mmots/broken-out/mm-memcg-avoid-oom-notification-when-current-needs-access-to-memory-reserves.patch
> for now.
>
> The claim that
> mm-memcg-avoid-oom-notification-when-current-needs-access-to-memory-reserves.patch
> will impact existing userspace seems a bit dubious to me.
>
I'm not sure why this was dropped since it's vitally needed for any sane
userspace oom handler to be effective.
Without the patch, a userspace oom handler waiting on memory.oom_control
will be triggered when any process with a pending SIGKILL or in the exit()
path simply needs access to memory reserves to make forward progress. The
kernel oom killer itself is preempted since nothing is actionable other
than giving current access to memory reserves by setting the TIF_MEMDIE
bit. Userspace does not have the privilege to set this bit itself, so in
such cases there is absolutely nothing actionable for the userspace oom
handler.
The problem is that the userspace oom handler doesn't know that.
It would be ludicrous to require that a userspace oom handler must wait
for some arbitrary amount of time to determine if it is actionable or not;
what is a sane amount of time to wait? Should we reliably expect that
multiple oom notifications will be sent over a period of time if we are in
a situation where current doesn't require memory reserves to make forward
progress? How long should the userspace oom handler store this state to
determine how many times it has woken up?
Userspace oom handling implementations are fragile enough as it is, they
should be made as trivial as possible to ensure they can do what is needed
to make memory available, have the smallest memory footprint possible, and
be as reliable as possible. Requiring them to determine when a
notification is actionable is troublesome.
Furthermore, Section 10 of Documentation/cgroups/memory.txt does not imply
that any of this checking needs to be done and lists possible actions that
a userspace oom handler can do upon being notified such as raising a limit
or killing a process itself. This is what userspace _expects_ to do when
notified.
Giving current access to memory reserves so that it may make forward
progress is something only the kernel can do and is a part of both the VM
and memcg implementations to allow forward progress to be made. It is not
something userspace is involved in.
Additionally, you're not losing any functionality by merging the patch, if
you really want to know simply when the limit has been reached and not
something actionable as stated by the memcg documentation, you can do so
with memory thresholds or VMPRESSURE_CRITICAL.
Google relies on this behavior so that userspace oom handlers can be
implemented to respond to oom conditions and not cause unnecessary oom
killing. We'd like to know why you refuse to provide such an interface in
a responsible and reliable way.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists