[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1311251717370.27270@chino.kir.corp.google.com>
Date: Mon, 25 Nov 2013 17:29:20 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Luigi Semenzato <semenzato@...gle.com>
cc: Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org,
Greg Thelen <gthelen@...gle.com>,
Glauber Costa <glommer@...il.com>,
Mel Gorman <mgorman@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Rik van Riel <riel@...hat.com>,
Joern Engel <joern@...fs.org>, Hugh Dickins <hughd@...gle.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: user defined OOM policies
On Wed, 20 Nov 2013, Luigi Semenzato wrote:
> Yes, I agree that we can't always prevent OOM situations, and in fact
> we tolerate OOM kills, although they have a worse impact on the users
> than controlled freeing does.
>
If the controlled freeing is able to actually free memory in time before
hitting an oom condition, it should work pretty well. That ability is
seems to be highly dependent on sane thresholds for indvidual applications
and I'm afraid we can never positively ensure that we wakeup and are able
to free memory in time to avoid the oom condition.
> Well OK here it goes. I hate to be a party-pooper, but the notion of
> a user-level OOM-handler scares me a bit for various reasons.
>
> 1. Our custom notifier sends low-memory warnings well ahead of memory
> depletion. If we don't have enough time to free memory then, what can
> the last-minute OOM handler do?
>
The userspace oom handler doesn't necessarily guarantee that you can do
memory freeing, our usecase wants to do a priority-based oom killing that
is different from the kernel oom killer based on rss. To do that, you
only really need to read certain proc files and you can do killing based
on uptime, for example. You can also do a hierarchical traversal of
memcgs based on a priority.
We already have hooks in the kernel oom killer, things like
/proc/sys/vm/oom_kill_allocating_task and /proc/sys/vm/panic_on_oom that
implement different policies that could now trivially be done in userspace
with memory reserves and a timeout. The point is that we can't possibly
encode every possible policy into the kernel and there's no reason why
userspace can't do the kill itself.
> 2. In addition to the time factor, it's not trivial to do anything,
> including freeing memory, without allocating memory first, so we'll
> need a reserve, but how much, and who is allowed to use it?
>
The reserve is configurable in the proposal as a memcg precharge and would
be dependent on the memory needed by the userspace oom handler at wakeup.
Only processes that are waiting on memory.oom_control have access to the
memory reserve.
> 3. How does one select the OOM-handler timeout? If the freeing paths
> in the code are swapped out, the time needed to bring them in can be
> highly variable.
>
The userspace oom handler itself is mlocked in memory, you'd want to
select a timeout that is large enough to only react in situations where
userspace is known to be unresponsive; it's only meant as a failsafe to
avoid the memcg sitting around forever not making any forward progress.
> 4. Why wouldn't the OOM-handler also do the killing itself? (Which is
> essentially what we do.) Then all we need is a low-memory notifier
> which can predict how quickly we'll run out of memory.
>
It can, but the prediction of how quickly we'll run out of memory is
nearly impossible for every class of application and the timeout is
required before the kernel steps in to solve the situation.
> 5. The use case mentioned earlier (the fact that the killing of one
> process can make an entire group of processes useless) can be dealt
> with using OOM priorities and user-level code.
>
It depends on the application being killed.
> I confess I am surprised that the OOM killer works as well as I think
> it does. Adding a user-level component would bring a whole new level
> of complexity to code that's already hard to fully comprehend, and
> might not really address the fundamental issues.
>
The kernel code that would be added certainly isn't complex and I believe
it is better than the current functionality that only allows you to
disable the memcg oom killer entirely to effect any userspace policy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists