linux-kernel - Re: user defined OOM policies

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 20 Nov 2013 23:03:33 -0800
From:	Luigi Semenzato <semenzato@...gle.com>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org,
	Greg Thelen <gthelen@...gle.com>,
	Glauber Costa <glommer@...il.com>,
	Mel Gorman <mgorman@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Rik van Riel <riel@...hat.com>,
	Joern Engel <joern@...fs.org>, Hugh Dickins <hughd@...gle.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: user defined OOM policies

On Wed, Nov 20, 2013 at 7:36 PM, David Rientjes <rientjes@...gle.com> wrote:
> On Wed, 20 Nov 2013, Luigi Semenzato wrote:
>
>> Chrome OS uses a custom low-memory notification to minimize OOM kills.
>>  When the notifier triggers, the Chrome browser tries to free memory,
>> including by shutting down processes, before the full OOM occurs.  But
>> OOM kills cannot always be avoided, depending on the speed of
>> allocation and how much CPU the freeing tasks are able to use
>> (certainly they could be given higher priority, but it get complex).
>>
>> We may end up using memcg so we can use the cgroup
>> memory.pressure_level file instead of our own notifier, but we have no
>> need for finer control over OOM kills beyond the very useful kill
>> priority.  One process at a time is good enough for us.
>>
>
> Even with your own custom low-memory notifier or memory.pressure_level,
> it's still possible that all memory is depleted and you run into an oom
> kill before your userspace had a chance to wakeup and prevent it.  I think
> what you'll want is either your custom notifier of memory.pressure_level
> to do pre-oom freeing but fallback to a userspace oom handler that
> prevents kernel oom kills until it ensures userspace did everything it
> could to free unneeded memory, do any necessary logging, etc, and do so
> over a grace period of memory.oom_delay_millisecs before the kernel
> eventually steps in and kills.

Yes, I agree that we can't always prevent OOM situations, and in fact
we tolerate OOM kills, although they have a worse impact on the users
than controlled freeing does.

Well OK here it goes.  I hate to be a party-pooper, but the notion of
a user-level OOM-handler scares me a bit for various reasons.

1. Our custom notifier sends low-memory warnings well ahead of memory
depletion.  If we don't have enough time to free memory then, what can
the last-minute OOM handler do?

2. In addition to the time factor, it's not trivial to do anything,
including freeing memory, without allocating memory first, so we'll
need a reserve, but how much, and who is allowed to use it?

3. How does one select the OOM-handler timeout?  If the freeing paths
in the code are swapped out, the time needed to bring them in can be
highly variable.

4. Why wouldn't the OOM-handler also do the killing itself?  (Which is
essentially what we do.)  Then all we need is a low-memory notifier
which can predict how quickly we'll run out of memory.

5. The use case mentioned earlier (the fact that the killing of one
process can make an entire group of processes useless) can be dealt
with using OOM priorities and user-level code.

I confess I am surprised that the OOM killer works as well as I think
it does.  Adding a user-level component would bring a whole new level
of complexity to code that's already hard to fully comprehend, and
might not really address the fundamental issues.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/